-->
Meta data is a bit of a mess. Which becomes acute when transcoding. First, there is the problem of multiple meta data formats. A serious pain. Second, mapping values between the tag fields used by the various standard. This one uses year and date only has moth and day, this uses date with date, month and years. Life is too short to handle this pointless stuff. Finally, the sheer number of tags available or being proposed - including image data. We think the current efforts miss the point. Audio files are not database records. The meta data carried in the audio file should be the absolute minimum necessary (not the maximum possible) with links to real databases or web pages where users can graze tons of data to their heart's content - including album art work and pictures of their idols - using systems designed for that purpose.
We got really confused - not the first time - and then pretty annoyed when developing our transcoding routines for our soon-to-be-released Open Source player when we realized that most of our effort was going into translation and mapping of tag/meta data. We were so annoyed that we even decided to jump into the debate with a proposal of our own. The rest of this page is about our proposal. You may want to leave now.
We like the Ogg Vobis basic comment/tag format (not the extended/image format for which we see no reason) over all the other formats we have seen for four reasons:
Simple: It uses a trivial name=value format. Infinitely extensible. Which is both good news and bad news.
Multi-Valued: Meta tags can be repeated any number of times. Artists want their individual names as well as the ensemble/band name - no problem. Define multiple ARTIST=name comments, one with the ensemble/band name one with the individual band/ensemble member's names. And so on. End up with a huge file - consequence of your decision.
Language Neutral: All strings are UTF-8. If you want to encode both the name and data parts in any language you choose that is your decision. If the user's reader cannot make sense of the character set, that is the consequence of your decision. Of course we could always sensibly define a single (fixed) CHARSET=data comment tag to indicate encoding. It would have to appear first, or be defaulted to either 8859-1 or the application's default character set, and would apply to all comment tags until another CHARSET was encountered. Want to repeat the information in multiple character sets - do it. Finally to those who would say that even CHARSET= betrays an English laguage dominance we would respond by saying that CHARSET is barely English to the vast majority of English speakers.
Human Readable: By simply dumping the comment field in the raw on a screen (in whatever character set was being used) the human can probably make sense of it even if the poor player/reader software cannot. The player/reader could even reliably, irrespective of content, split the field at the = sign to be really, really helpful to we poor humans. Smarter software could let the human edit either, or both, of the name and data values and transcribe it reliably to the output file without any programatic understanding of the content.
So the first part of our proposal is to standardize on Ogg Vorbis comment fields as defined here.
So now we come to the thornier question of what should be included in the meta data. We propose the absolute minimum practical. Perhaps the Dublin Core Meta Data work has some relevance here. However, in the meantime we think the minimum meta data should be organized into 4 sections:
Identification Section: Enough data for the human and application to identify the audio file. Mostly text based but should include a single, unambigous, globally unique reference number. Perhaps the ISRC number or a GUID or some form. Goodness the IETF/IANA has used the enterprise value to fully distribute the allocation of authority for unique numbering (OIDs) for years and which could easily provide the basis of a simple - yet highly effective - system. At zip cost.
Rights Section: Identifies the right's holder for the audio work and any additional licensing information the publsher thinks is important.
Audio Section: This is perhaps, arguably, the least essential part. So much so that we make no explicit proposals for this section. The audio file, of necessity, already contains all the information necessary to play the encapsulated work in the manner that the publisher intended. There is, however, a school of thinking that says perhaps more information should be supplied to allow the user/player to automatically correct for loudness, equalization etc. Smart players should already be able to do this by simply monitoring the output sound quality and automatically applying user preferred relative volume and equalization adjustments. Besides, in extremis, there is the volume or treble/bass controls on even the crummiest player/reader. On balance we do not favour adding any additional audio information to the file. Though we understand where the idea is coming from.
Infomation Section: A modest section containing space for the user to make whatever comments they want, a Checksum field and a URL that would typically be used by a browser but could also point to an LDAP database or any other valid URL entity from which may be obtained any other information the publisher/supplier of the track wants to supply in whatever format they want to supply it. If the publisher insists on adding the shoe size of each performer they are entirely free do it here. You want to rate the song five stars use private data. We have no interest, frankly, if you rate the encapsulated work 7 on the richter scale of enjoyment. Keep your private thoughts, private. Perhaps between consenting adults. Don't pollute the public name space with irrelevance.
In detail this is what we think should be in the audio file meta data - much already exists. The tag names are shown in English and upper case (they are in practice always case-insensitive) though, as noted above, we see no reason for this to be the case other than the CHARSET tag:
Note: We use the term publisher below to denote the the entity/person who originally made the material available. This would exclude any third party distribution agency who we see as merely tactical.
| Section | Tag Name | Purpose | Notes |
| Identification | CHARSET= | Defines the character set from the ISO 8859 list. | Could be either full format, for example, 8859-1 or simply 1. Causes the reader/player to apply character set translation to all subsequent tags (name part, = part and data part) until either the end of all comments or until another CHARSET= tag is encountered. If not present the player/reader uses its default locale settings. |
| Identification | TITLE= | The name of the work | The name by which the publisher wants to refer to the work encapsulated in the audio part. If the work is known by multiple names or has well known aliases, for example, John Brown's Body and The Battle Hymn of the Republic, then a second (or third, fourth etc.) TITLE= tag can be added as required. Alternatively, all possible names could be added within a single TITLE= tag. |
| Identification | ARTIST= | The name of an entity associated with the work | At the discretion of the publisher, may be an ensemble/band name, performer name, composer name or even all of them using long tags or multiple ARTIST= tags. There is nothing to stop the publisher adding ARTIST=New York Symphony Orchestra and then also adding ARTIST= tags for each of the orchestra members involved in the work. Whether this would be sensible should be entirely a publisher descision. |
| Identification | COLLECTION= | The name of any collection with which the work is associated | Optional. This replaces, being more generic, the ALBUM= tag used widely today. In many cases it may be an album name, but in future - who knows. Besides, the term Album is already anachronistic. |
| Identification | ITEM= | Identification of the work within a larger collection | Optional. Generally only relevant if the COLLECTION= tag is present. Identification of the specific work within some COLLECTION. Could be, and in many cases will be, a TRACKNUMBER= within an album, but could also be the K number of a Mozart work if that was more relevant. |
| Identification | YEAR= | The year (4 digit format) that the publisher made the work available. | While for historical recordings this field is largely irrelevant - the date of recording being far more interesting - it has, perhaps, more immediate relevance with contemporary music. However, even in the context of historic material it may still allow the consumer to guage the likely quality of the material. A 1967 remastering of a 1903 Caruso work will have a certain level of quality compared with the same work remastered in 2004. We would certainly expect to see source material information, such as date and location of recording, separately made available by the publisher through the MORE= tag below. |
| Identification | GENRE= | Unlimited classification name of the work | The current use of limited GENRE values originates from the ID3v1 tag which used a single digit and a name to number conversion (we note MP4 seems to have stumbled into the same blind alley). It was the product of a limitation which has since been removed. Use of a free format value, as opposed to a limited set (players/readers and writers would still be free to suggest values), means the tag is future proofed and removes the need for constant re-classification and ongoing maintenance. More than one GENRE= tag can be used for cross-over music. It is in the interests of publishers to get this tag value right for both tactical and strategic reasons. |
| Identification | GUID= | A Globally Unique Identifier by which the work can be identified. | While GUID sounds a bit geeky it does describe the idea behind the tag. Whether this is a ISRC number or an EAN or a Publishers Code or GTIN is a bigger question. Ideally the number should allow legitimate derivations of the work (re-mixes, remasterings etc) to be easily tracked from the source material to aid in cataloging and information retrieval - not simply to provide a means to administer punitive measures demanded by certain involved parties. |
| Rights | COPYRIGHT= | Text used to give notice of any asserted rights to the work | Determined solely by the publisher of the work based on National/International law/treaties etc as well as any commercial requirements. |
| Rights | RIGHTS= | URL | URL link to a networked resource describing the rights of the user and publsher with respect to the work. |
| Audio | - | - | No tags proposed |
| Information | MORE= | URL | Optional. URL link to a networked resource (web, LDAP etc) which can be used to supply any required information the publisher wishes to make available about the work including, but not limited to, the performers, the composer, the date and location of the recording, the shoe sizes of the peformers. The data supplied would normally include the checksum (see below) if provided. |
| Information | CHECKSUM= | Textual checksum/hash value | Covering all fields provided with the originally published work with the exception of any COMMENTS= tag(s). Thus to generate the same checksum as the originally published work any writing software would have to faithfully copy all tags with the exception of any COMMENTS= tag(s). The idea here is not to stop editors/players/readers changing tags - an enterprise doomed to failure - but rather to allow a user/player/reader to verify that changes had taken place from the original. The user can then decide to use the audio file or not. There is little to stop a malicious publisher simply recomputing the checksum and providing an appropriate web page to confirm the new new checksum - other than the visibilty that would be entailed in doing so. Schemes whereby the checksum would use some or all of a X.509/DNS certificate from the original publisher could eliminate the simplest replacement scenarios but at the cost of some additional complexity. There are some tactical considerations about extracting the checksum value from, say, a web page - but these are simply tactical with multiple solutions. Finally the purpose of this tag is not to provide the music industry with another DRM stick with which to beat the poor consumer, but rather a tool that will help the vast majority of law-abiding citizens to stay law-abiding. |
| Information | COMMENTS= | Text | A text tag to which the user/purchaser of the work would have unlimited and unfettered access. |
In short players/readers would copy all fields unchanged (thus preserving the CHECKSUM= value) but allow the user to freely edit the COMMENTS= tag. It is, IOHO, completely impractical to stop the pirating and changing of published material using this process or indeed any other (otherise it would have been done many years ago) but by allowing verification the user can determine that piracy may have occured and can then make their own decisions as to what to do next, with the full knowledge of any consequences. And sure, a rogue player could always say that any track statisfied the CHECKSUM= criteria.
Naive. Perhaps. Effective. Possibly. But a lot better that the big fat zero of today. And for - relatively - trivial effort.