Metadata

At the core of TransparentMeta lies the concept of transparency metadata. This metadata encapsulates critical information about AI-generated audio content, enabling traceability, compliance with AI transparency laws, and clear attribution. It serves as a digital fingerprint that describes who created the content, how it was generated, and under what conditions.


The metadata object

The Metadata object is how we formalize the concept of metadata in TransparentMeta. The Metadata object is the primary domain entity in TransparentMeta. It is implemented as a Pydantic model, ensuring strong typing, validation, and easy serialization. This object collects all necessary information required to comply with AI transparency legislation for generative AI audio.

Each instance of Metadata represents a snapshot of an audio asset’s origin and usage context, providing a standardized format for recording and sharing transparency details.


Fields of the Metadata object

The Metadata model includes the following fields, each capturing a specific aspect of the content generation and origin:

metadata_dict = {
    "company": "Transparent Audio",
    "model": "v2.1",
    "created_at": datetime.now(timezone.utc),
    "ai_usage_level": AIUsageLevel.AI_ASSISTED,
    "content_id": "12345",
    "user_id": "user_67890",
    "private_key_id": "dummy_private_key_id",
    "additional_info": {
        "attribution": {
            "lyrics": "John Doe",
            "composer": "Jane Smith",
            "singer": "HAL 9000",
        }
    },
}

metadata = Metadata(**metadata_dict)

Let’s check each field in detail:

company

Type: str
The name of the company responsible for generating the audio content. This identifies the legal or organizational entity accountable for the AI system output, that is responsible for ensuring compliance with AI transparency laws. In most cases, this will be the company that owns the generative AI audio model.

model

Type: str
The exact name and version of the AI model or system used to generate the audio. This field allows tracking of the generative technology responsible for the content.

created_at

Type: datetime
A timestamp indicating when the audio content was created. It supports auditing and chronological tracing.

ai_usage_level

Type: AIUsageLevel (enum)
Represents the degree of AI involvement in the creation of the audio. This is a critical field for transparency and must be chosen from the following options:

  • AI_GENERATED: Audio fully generated by AI with little or no human input.

  • AI_ASSISTED: Audio created through a collaboration between AI and humans, where humans played a significant creative role.

  • HUMAN_CREATED: Audio entirely created by humans without AI involvement.

This enumeration helps clarify the nature of AI participation and supports regulatory disclosure requirements.

content_id

Type: str
A unique identifier for the specific audio content. This ID enables tracking and referencing individual pieces of audio within larger systems or databases.

user_id

Type: str
Identifies the user or entity that initiated the generation process. This can be used to attribute content creation requests within multi-user platforms or services.

private_key_id

Type: str
The identifier of the private cryptographic key used to sign the metadata. This field is essential for verifying the authenticity and integrity of the metadata. Please note: this is not the private key itself, but rather a reference to it, which should be stored securely in a key management system or similar secure system.

additional_info

Type: Optional[Dict]
An optional dictionary designed to hold extended or custom metadata fields. You can use this field to include any additional information relevant to your use case.

Attribution information

One important use case that can be addressed in the additional_info field is attribution information. In the context of copyright and creative works, attribution refers to the acknowledgment of contributors such as lyricists, composers, performers, or other creative roles.

We suggest using the additional_info field to store attribution details using a structure similar to this:

additional_info = {
  "attribution": {
    "lyrics": "John Doe",
    "composer": "Jane Smith",
    "singer": "HAL 9000"
  }
}

How is Metadata written in audio files?

TransparentMeta uses the mutagen library under the hood to embed metadata into audio files in a reliable and standardized way.

Metadata storage format: ID3v2 tags

For MP3 files, metadata is stored within ID3v2 tags, a widely adopted standard that allows embedding descriptive information directly inside the audio file. ID3v2 tags can hold various data such as artist name, album, track title, and, importantly for TransparentMeta, cryptographically signed transparency metadata.

ID3v2 tags consist of frames, each containing a specific type of metadata. Custom frames can be added to store additional information. This flexibility makes ID3v2 tags ideal for embedding custom data. The transparency metadata are embedded within such custom tags.

Handling WAV files

WAV files do not natively support ID3v2 tags. To address this, MutaGen adds custom chunks to the WAV file header that replicate the ID3v2 tag structure. This approach ensures that the metadata format remains consistent across audio formats while respecting each format’s specifications.