Digital Audio - A Proposed Meta Format

Meta data is a bit of a mess. Which becomes acute when transcoding. First, there is the problem of multiple meta data formats. A serious pain. Second, mapping values between the tag fields used by the various standard. This one uses year and date only has moth and day, this uses date with date, month and years. Life is too short to handle this pointless stuff. Finally, the sheer number of tags available or being proposed - including image data. We think the current efforts miss the point. Audio files are not database records. The meta data carried in the audio file should be the absolute minimum necessary (not the maximum possible) with links to real databases or web pages where users can graze tons of data to their heart's content - including album art work and pictures of their idols - using systems designed for that purpose.

We got really confused - not the first time - and then pretty annoyed when developing our transcoding routines for our soon-to-be-released Open Source player when we realized that most of our effort was going into translation and mapping of tag/meta data. We were so annoyed that we even decided to jump into the debate with a proposal of our own. The rest of this page is about our proposal. You may want to leave now.

Meta Format

We like the Ogg Vobis basic comment/tag format (not the extended/image format for which we see no reason) over all the other formats we have seen for four reasons:

Simple: It uses a trivial name=value format. Infinitely extensible. Which is both good news and bad news.
Multi-Valued: Meta tags can be repeated any number of times. Artists want their individual names as well as the ensemble/band name - no problem. Define multiple ARTIST=name comments, one with the ensemble/band name one with the individual band/ensemble member's names. And so on. End up with a huge file - consequence of your decision.
Language Neutral: All strings are UTF-8. If you want to encode both the name and data parts in any language you choose that is your decision. If the user's reader cannot make sense of the character set, that is the consequence of your decision. Of course we could always sensibly define a single (fixed) CHARSET=data comment tag to indicate encoding. It would have to appear first, or be defaulted to either 8859-1 or the application's default character set, and would apply to all comment tags until another CHARSET was encountered. Want to repeat the information in multiple character sets - do it. Finally to those who would say that even CHARSET= betrays an English laguage dominance we would respond by saying that CHARSET is barely English to the vast majority of English speakers.
Human Readable: By simply dumping the comment field in the raw on a screen (in whatever character set was being used) the human can probably make sense of it even if the poor player/reader software cannot. The player/reader could even reliably, irrespective of content, split the field at the = sign to be really, really helpful to we poor humans. Smarter software could let the human edit either, or both, of the name and data values and transcribe it reliably to the output file without any programatic understanding of the content.

So the first part of our proposal is to standardize on Ogg Vorbis comment fields as defined here.

Meta Data Content

Now the thornier question of what meta data should be included in the audio file. We propose the absolute minimum practical. Perhaps the Dublin Core Meta Data work has some relevance here - though the volume of data is pretty daunting and its purpose is, fundamentally, cataloguing and retrieval. However, in the meantime we think the minimum meta data should be organized into 4 sections:

Identification Section: Enough data for the human and application to identify the audio file. Mostly freeform text but should include a single, unambigous, globally unique reference number capably of programmatic use. Perhaps the ISRC number or a GUID or some form. The IETF/IANA have used the enterprise number to fully distribute the allocation of authority for unique numbering (OIDs) for years and which could easily provide the basis of a simple - yet highly effective - system. At zip cost.
Rights Section: Identifies the right's holder for the audio work and any additional licensing information the publsher thinks is important.
Audio Section: This is perhaps, arguably, the least essential part. So much so that we make no explicit proposals for this section. The audio file, of necessity, already contains all the information necessary to play the encapsulated work in the manner that the publisher intended. There is, however, a school of thinking that says perhaps more information should be supplied to allow the user/player to automatically correct for loudness, equalization etc. Smart players should already be able to do this by simply monitoring the output sound quality and automatically applying user preferred relative volume and equalization adjustments. Besides, in extremis, there is the volume or treble/bass controls on even the crummiest player/reader. On balance we do not favour adding any additional audio information to the file. Though we understand where the idea is coming from.
Private Section: This section contains tag name space for private data. You want to rate the song five stars use private data. You want to index the work within some private database go ahead. The only criteria for tag names in this section is they should not have the same name (in whatever language being used) as those in the other sections.

Specific Meta Data Proposal

In detail this is what we think should be in the audio file meta data - much already exists. The tag names are shown in English and upper case (they are in practice always case-insensitive) though, as noted above, we see no reason for this to be the case other than the CHARSET tag:

Note: We use the term publisher below to denote the the entity/person who originally made the material available. This would exclude any third party distribution agency who we see as merely tactical.

Section	Tag Name	Purpose	Notes
Identification	CHARSET=	Defines the character set from the ISO 8859 list.	Could be either full format, for example, 8859-1 or simply 1. Causes the reader/player to apply character set translation to all subsequent tags (name part, = part and data part) until either the end of all comments or until another CHARSET= tag is encountered. If not present the player/reader uses its default locale settings.
Identification	TITLE=	The name of the work	The name by which the publisher wants to refer to the work encapsulated in the audio part. If the work is known by multiple names or has well known aliases, for example, John Brown's Body and The Battle Hymn of the Republic, then a second (or third, fourth etc.) TITLE= tag can be added as required. Alternatively, all possible names could be added within a single TITLE= tag.
Identification	ARTIST=	The name of an entity associated with the work	At the discretion of the publisher, may be an ensemble/band name, performer name, composer name or even all of them using long tags or multiple ARTIST= tags. There is nothing to stop the publisher adding ARTIST=New York Symphony Orchestra and then also adding ARTIST= tags for each of the orchestra members involved in the work. Whether this would be sensible should be entirely a publisher descision.
Identification	COLLECTION=	The name of any collection with which the work is associated	Optional. This replaces, being more generic, the ALBUM= tag used widely today. In many cases it may be an album name, but in future - who knows. Besides, the term Album is already anachronistic.
Identification	ITEM=	Identification of the work within a larger collection	Optional. Generally only relevant if the COLLECTION= tag is present. Identification of the specific work within some COLLECTION. Could be, and in many cases will be, a TRACKNUMBER= within an album, but could also be the K number of a Mozart work if that was more relevant.
Identification	YEAR=	The year (4 digit format) that the publisher made the work available.	While for historical recordings this field is largely irrelevant - the date of recording being far more interesting - it has, perhaps, more immediate relevance with contemporary music. However, even in the context of historic material it may still allow the consumer to guage the likely quality of the material. A 1982 remastering of a 1903 Caruso work will have a certain level of quality compared with the same work remastered in 2004. We would certainly expect to see source material information, such as date and location of recording, separately made available by the publisher through the MORE= tag below.
Identification	GENRE=	Unlimited classification name of the work	The current use of limited GENRE values originates from the ID3v1 tag which used a single digit and a name to number conversion (we note MP4 seems to have stumbled into the same blind alley). It was the product of a limitation which has since been removed. Use of a free format value, as opposed to a limited set (players/readers and writers would still be free to suggest values), means the tag is future proofed and removes the need for constant re-classification and ongoing maintenance. More than one GENRE= tag can be used for cross-over music. It is in the interests of publishers to get this tag value right for both tactical and strategic reasons.
Identification	GUID=	A Globally Unique Identifier by which the work can be identified.	While GUID sounds a bit geeky it does describe the idea behind the tag. Whether this is a ISRC number or an EAN or a Publishers Code or GTIN is a bigger question. Ideally the number should allow legitimate derivations of the work (re-mixes, remasterings etc) to be easily tracked from the source material to aid in cataloging and information retrieval - not simply to provide a means to administer punitive measures demanded by certain involved parties.
Information	INFO=	URL	Optional. URL link to a networked resource (HTTP, LDAP etc) which can be used to supply any required information the publisher wishes to make available about the work including, but not limited to, the performers, the composer, the date and location of the recording, the shoe sizes of the peformers. The data supplied would normally include the checksum (see below) if provided. The URL provided could be an explicit page reference or could provide a query string interface to perform a GUID lookup.
Rights	COPYRIGHT=	Text used to give notice of any asserted rights to the work	Determined solely by the publisher of the work based on National/International law/treaties etc as well as any commercial requirements.
Rights	RIGHTS=	URL	URL link to a networked resource describing the rights of the user and publsher with respect to the work.
Audio	-	-	No tags proposed
Private	COMMENTS=	Text	A text tag to which the user/purchaser of the work would have unlimited and unfettered access.
Private	??=	??	Any other tag that the user working in close harmony with their local tag editor/player want to to whatever they want.

In short players/readers would, by default transcode/copy all tags in the Identification, Rights and Audio sections unchanged. Anything in the private section would be decided by the user perhaps based on the purpose of the transcode. You want to change the Identification, Audio or Rights section then you need to re-publish.

Problems, comments, suggestions, corrections (including broken links) or something to add? Please take the time from a busy life to 'mail us' (at top of screen), the webmaster (below) or info-support at zytrax. You will have a warm inner glow for the rest of the day.