HTML as media container format

Lucas Gonze

Yahoo! Music

July 23, 2006

A part of the XIPF project


My high-level goal is to allow music files to be accompanied by packaging such as artwork and lyrics, as well as by metadata like song title and composer. This document contains an experiment towards that goal in which I embed an encoded version of a music file within an HTML file. In addition to the music file, the artwork, lyrics and metadata are also embedded in the HTML file, so that the file acts as a self-contained package of related resources.


Metadata and packaging for audio and video files are normally embedded in the files. The MP3 file format, for example, allows segments of the file to be designated as metadata using the ID3 standard; these segments then specify information such as song title and artist name.

The problem with this method is that metadata and packaging must be redesigned and reimplemented for every audio or video file format, of which are an endless number, and every program which wants to handle audio or video metadata must be extended to parse each one of these formats. The complexity is exponential, such that every one of N possible programs must implement every one of M possible file formats. This is hopeless, and in practice programs put drastic limits on the number of file formats for which they support metadata.

The underlying flaw is in the strategy of mingling data and metadata. Packaging formats such as zip and tar can contain any kind of file because the contained files and container files are in formats which are clearly separated. To support metadata in audio and video files, we should do the same -- separate generic metadata from specialized media, and package them together in generic containers.

Technical approach

There are many potential container formats. In this document I use HTML rather than a more obvious candidate such as zip, tar, or Ogg. HTML excels at metadata, presentation and ease of hacking, while zip, tar and Ogg excel as containers for other files. I have a choice between implementing metadata (etc) in a container format or container functionality in a metadata (etc) format; in this case I'm doing the second.

Using HTML as a container format for media files is made possible by the "data" URI scheme. Such URLs begin with data: rather than http:. This scheme is formally defined in RFC 2397. The data scheme is special in that it allows a link to contain data, rather than point to data elsewhere.

A sample URI looks like this:


The pieces of that URI are as follows:

URI scheme media type of the resource encoding method used for converting binary resources into text encoded resource
data: audio/mpeg; base64, //M4xAAUc(etc...)

This technique is usually invoked for images, for example to have a small company logo in a page without referring to any external host. There is no need to use it only for images, however, because the media type is not fixed to image types. To embed music files instead, we only have to specify a corresponding media type.

Experimental HTML

This document itself is the experiment, in that it is an HTML document which contains an audio file and associated packaging.

Embedded audio file

The sample audio file is an MP3 of my own making which contains the spoken text My pants are on fire. This file has the advantage of being small enough to be easy to work with and of being free of copyright-related problems.

I have encoded the audio file and inserted it in this HTML document. If your browser supports data: URIs, you may be able to click on the following link and get a working audio file:

hyperlink for embedded audio

This hyperlink can be used just like a normal one. For example, an embedded player control can render the audio within the containing HTML page. Murphy's law notwithstanding, you should be able to click here to embed a player control in this page without fetching a remote object.


I would like to be able to enclose album art and lyrics. I can enclose album art using the same technique as with the audio file -- by encoding the art and putting the encoded value in a data: URI. I can enclose lyrics by typing them into the HTML document in some form visible to the reader after the page has been rendered by the browser.

For example:
Album art (just a picture of me, because I am lazy)
album art for 'my pants are on fire' MP3
My pants are on fire.

I would also like to be able to specify metadata for the artist and title of the audio file. The metadata fields for artist and title of the audio file are not nearly as easy as album art and lyrics. I could specify the artist and title in the same way as album art and lyrics -- by embedding them in the HTML without any labelling, so that they were visible to the user but not structured to help computer programs find them. I might also use an HTML extension method such a namespaces or microformats to define and then insert metadata fields.

As an example of the use of microformats, I might take advantage of an existing feature of HTML. For example Anchor (or A) elements have a title attribute in HTML 4 which is a good candidate for the title of the audio file. In the case of this particular file, the HTML for the song would then look like this:

   title="My Pants Are On Fire"

I can specify the artist metadata using the Creative Commons method of embedding rights claims in HTML. See the source code of the public domain dedication below for the complete syntax.



  1. Not supported by Internet Explorer

    This technique can be made to work in Firefox, Safari, and Opera, but that leaves out the 80% of users who are on Internet Explorer versions which exist as of this writing in summer 2006.

  2. No streaming

    HTML documents have to be received in their entirety before they can be rendered, which means they cannot be streamed. If audio files are embedded in HTML documents, they will also have to be received in their entirety before they can be rendered.

  3. Size limits

    In practice, data URIs cannot hold arbitrary amounts of information. A typical multimedia file is far larger than these limits.

  4. Limits to data structure in HTML

    The package data needs to be in a structured format, so that data elements like song title and composer can be recognized for what they are. HTML does not have existing syntax for these elements; it will have to be extended to contain such a syntax, and HTML extension methods are very immature at this point.


  1. No new standards needed.
  2. Tools which can render this format are mature and widely available.
  3. Powerful.
  4. Extremely customizable presentation.


I have shown that it is possible to create packages for audio files using HTML as a container format. Using this technique I was able to play the MP3 embedded in this document, to view an embedded image corresponding to album art for the HTML, to enclose lyrics, to specify title metadata, and to specify artist metadata.

The technique I used has significant drawbacks, however.


License for sample MP3 file

This work is hereby released into the Public Domain. To view a copy of the public domain dedication, visit or send a letter to Creative Commons, 543 Howard Street, 5th Floor, San Francisco, California, 94105, USA.

Creative Commons License This work is licensed under a Creative Commons Public Domain License.

(That license is for the MP3, not for this page. I reserve all rights on this page).

Source code

The encoded URI was created using PHP source code from the Wikipedia entry on Data: URLs. Source is as follows:

function &data_url( $file, $mime ){
  $handle = fopen($file, "r");    
  $contents = fread($handle, filesize($file));
  $base64 = base64_encode($contents);  
  return "data:$mime;base64,$base64";

That can be invoked to generate an HTML anchor element using syntax like this:

<a href="<?= data_url('pants.mp3','audio/mpeg')?>" />