There is no XML format for playlists that can measure up to the standards of the formats for web pages (HTML), weblogs (RSS), and web graphs (RDF/XML). It is evident that there is a need, because XML is the preferred data description language of the moment and as a result the tools and skills to use it are ubiquitous.
It is also evident that existing playlist formats fall short. ASX (for Windows Media Player) and the iTunes library format are proprietary. ASX resembles XML in that it uses angle brackets, but is not XML by any means. M3U, RAM, and M4U are flat files; QuickTime is binary; Pls is in the Windows .ini format; Gnomoradio RDF is RDF, not XML. SMIL is too hard to implement. The timing model of RSS doesn't fit audio and video. Forcing timing models into HTML, as HTML+Time does, creates an unintelligible feature set. Few of these formats are well documented. None of these formats make simple features easy to code and hard features possible. Only one is an open standard. Not a single one offers interoperability across major vendors. Not a single one attempts to solve these problems.
The pressing question for software developers is why should I support this XML playlist format instead of or in addition to ASX, SMIL, Gnomoradio RDF, iTunes XML, or RSS? Why does the world need yet another XML playlist format? The answer is XSPF is by far the most carefully crafted XML playlist format.
<?xml version="1.0" encoding="UTF-8"?> <playlist version="0" xmlns = "http://xspf.org/ns/0/"> <trackList> <track><location>file:///mp3s/Yo%20La%20Tengo/1_Nuclear%20War%20Version%201.mp3</location></track> <track><location>file:///mp3s/Yo%20La%20Tengo/2_Nuclear%20War%20Version%202.mp3</location></track> <track><location>file:///mp3s/Yo%20La%20Tengo/3_Nuclear%20War%20Version%203.mp3</location></track> <track><location>file:///mp3s/Yo%20La%20Tengo/4_Nuclear%20War%20Version%204</location></track> <track><location>file:///mp3s/Yo%20La%20Tengo/5_Nuclear%20War%20Version%204%20(Mike%20Ladd%20Remix).mp3</location></track> </trackList> </playlist>or this:
<?xml version="1.0" encoding="UTF-8"?> <playlist version="0" xmlns = "http://xspf.org/ns/0/"> <trackList> <track><location>http://yolatengo.com/1_Nuclear%20War%20Version%201.mp3</location></track> <track><location>http://yolatengo.com/2_Nuclear%20War%20Version%202.mp3</location></track> <track><location>http://yolatengo.com/3_Nuclear%20War%20Version%203.mp3</location></track> <track><location>http://yolatengo.com/4_Nuclear%20War%20Version%204</location></track> <track><location>http://yolatengo.com/5_Nuclear%20War%20Version%204%20(Mike%20Ladd%20Remix).mp3</location></track> </trackList> </playlist>
Version 0 is frozen and complete. Anybody writing code according to this specification can be very confident that it will not change.
The home of our working group is http://xspf.org. Members include Dave Brown and Ian Rogers of Yahoo; Dan Brickley from the W3C; Kevin Marks of Technorati; Matthias Freidrich and Robert Kaye of MusicBrainz; Ryan Shaw of UC Berkely; and myself, from Webjay.
This author of this document is Lucas Gonze. Our group started work in the winter of 2004. The first public draft of this document was May 9, 2004, the most recent was January 6, 2005.
In a few places in this document I will use the term MUST in all caps, which will remind some readers of formal standards. In this document the term should be interpreted to mean that something shouted is important. XSPF is not a standard, it is an ad-hoc project by a group of individuals.
An XSPF playlist describes a sequence of objects to be rendered. Objects might be audio, video, text, playlists, or any other media type. The function of a playlist is to identify the objects and communicate their order.
The function of a playlist is not to communicate metadata about the composer, song title, etc. Metadata is hard and there are many providers already. We decided that we couldn't compete, and that there was no need for us to try. Moreover, good metadata does not travel well -- every user has to recreate it. Metadata should come from external sources and namespaces like MusicBrainz or Gracenote; this what the XSPF link and meta elements are for.
The function of a playlist is not to store derived information about objects that a user has a copy of. A playlist is not a catalog. A catalog is computed across hard data like files; it stores information like filesystem paths and the contents of ID3 tags. This data has no value on any machine but the one on which it originated. Sharing this data would be a privacy and security violation. Software which needs access to this data has no reason to maintain it in a standard format, because it has no reason to allow access to it. Standardizing this data would be fruitless, because there are an endless number of measurements that software might take and store. Derived information belongs in a catalog.
Things a playlist is not, then, are a metadata format or a catalog. We took care to enable these features, but also to avoid duplicating their functionality, poorly.
If there is no reason for a playlist to be shared, there is no need for a new format. Even a buggy format does no damage if it is created and consumed by the same software on the same machine. The need for a new format only comes up when a playlist travels from one machine to another, for example when it is published on the internet.
One type of shareability is between different pieces of software on the same machine. It is common for playlists created with one application to not be usable by another application on the same machine because of different or conflicting interpretations of the playlist format. M3U suffers from this very badly, because M3U playlists often reference files according to a base path which changes from application to application. The XSPF group aimed to fix this by providing unambiguous definitions.
The other type of shareability is between different machines. For playlists to be meaningful on different machines, they must be able to identify network resources. Audio and video objects are often abstractions like "movie X by director Y" rather than computer-friendly objects like "whatever file can be gotten from the URL http://foo/x/y". To handle this problem, we have provided support for media objects to be found via queries; XSPF identifiers are fuzzy names.
An ordered list of URIs. The purpose is to satisfy licenses allowing modification but requiring attribution. If you modify such a playlist, move its //playlist/location element or //playlist/identifier to the top of the items in the //playlist/attribution element. xspf:playlist elements MAY contain exactly one xspf:attribution element.
<attribution> <location>http://snafu.com/modified_version_of_modified_version_of_original_playlist.xspf</location> <location>http://bar.com/modified_version_of_original_playlist.xspf</location> <location>http://foo.com/original_playlist.xspf</location> </attribution>
The link element allows non-XSPF web resources to be included in XSPF documents without breaking XSPF validation. xspf:playlist elements MAY contain zero or more link elements.
<link rel="http://foaf.org/namespace/version1">http://socialnetwork.org/foaf/mary.rdfs</link>
URI of a resource.
The meta element allows non-XSPF metadata to be included in XSPF documents without breaking XSPF validation. xspf:playlist elements MAY contain zero or more meta elements.
<meta rel="http://example.org/key">value</meta>
Value of the metadata element. MUST be valid text/plain, not XML.
Ordered list of xspf:track elements to be rendered. The sequence is a hint, not a requirement; renderers are advised to play tracks from top to bottom unless there is an indication otherwise.
If an xspf:track element cannot be rendered, a user-agent MUST skip to the next xspf:track element and MUST NOT interrupt the sequence.
xspf:playlist elements MUST contain one and only one trackList element.
The link element allows non-XSPF web resources to be included in xspf:track elements without breaking XSPF validation.
<link rel="http://foaf.org/namespace/version1">http://socialnetwork.org/foaf/mary.rdfs</link>
URI of a resource.
The meta element allows non-XSPF metadata to be included in xspf:track elements without breaking XSPF validation.
<meta rel="http://example.org/key">value</meta>
Value of the metadata element. MUST be valid text/plain, not XML.
On a surface level you can use XSPF like any other playlist format. Drop a bunch of filenames into an XSPF document, prepend "file://" to each, and you're ready to go. Under the surface there is much more.
The guiding design principle was to separate the functionality of a catalog of files from the functionality of a list of songs. Most MP3 players have some sort of cache for file information. This cache stores a list, or catalog, of available files and metadata from ID3 tags and other sources. XSPF is not a catalog format. XSPF exists only to say which songs to play. Almost everything in XSPF is for the purpose of answering the question which resource, rather than the question what is this resource.
If XSPF is not a catalog format, what is it? XSPF is an intermediate format. We expected a new kind of software called a content resolver to do the job of converting XSPF to a plain old list of files or URLs. A content resolver would be smart enough to keep your playlists from breaking when you move your MP3s from /mp3s to /music/mp3. It would be able to figure out that a playlist entry by the artist "Hank Williams" with the title "Your Cheating Heart" could be satisfied by the file /mp3s/hankwilliams/yourcheatingheart.mp3. It might even know how to query the iTunes music store or another online provider to locate and download a missing song.
The content resolver maintains the catalog of your songs in whatever format it prefers. It might use a flatfile, a file in the Berkeley DB format, or a SQL database. It might use only ID3 metadata, but it might also know how to query MusicBrainz or another metadata service.
If a media player is unable to render a resource, the show MUST go on. Playlists exist in time; a player that stops processing when it encounters an error is considered broken; it is not conformant with the standard; it must be shunned by the community and made an outcast. Players will frequently encounter resources that they cannot render -- this is not a fatal error unless the player stops processing the playlist.
Relative paths MUST be resolved according to the XML Base specification or IETF RFC 2396:
The rules for determining the base URI can be summarized as follows (highest priority to lowest):
- The base URI is embedded in the document's content.
- The base URI is that of the encapsulating entity (message, document, or none).
- The base URI is the URI used to retrieve the entity.
- The base URI is defined by the context of the application.
Scenario: A user clicks on a link to an audio or video object in their browser. The browser needs to hand the object off to a helper application like an MP3 player. If there is an intermediate playlist object between the browser and helper application, and the browser needs to ensure that the right helper is launched, the playlist needs to be of a type which is mapped to the same helper application.
Typical solution: Use a dedicated playlist format for almost every media subtype. For Real audio there is RAM; for MP4 video there is M4U; for MP3 there is M3U; even though RAM, M4U and M3U are almost identical in syntax. The QuickTime format is able to avoid this problem only because the container format and media format are integrated -- a QuickTime file is both a playlist and a media object.
XSPF' solution: The XSPF format does not yet have a solution to this problem, because the working group has not yet tackled it. (Though I can speculate that a content resolver in between the browser and helper application would have the means to do it).
Scenario: A user clicks on an audio or video link. Before handing off control to the helper application, the browser must download whatever the link points to. For streaming media this makes no sense; either the download will never finish or waiting for a complete download defeats the purpose.
Typical solution: rather than linking to an audio or video document, link to a playlist containing a URL of an audio or video document. Playlists used for this purpose often contain only a single URL. The Pls format, which is used for MP3-based webcasting, and which contains a single URL of a never-ending stream, takes this approach.
XSPF' solution: any reasonably compact playlist format supports this equally well. This rules out iTunes library format and sometimes QuickTime, but allows XSPF along with M3U, Pls and other relatively terse formats.
Scenario: There is a very large object like a DVD rip. The likelyhood of downloading the entire object in one shot is low, so the object has been split into pieces. The object then needs to be reassembled on the client side.
Typical solution: Create a zip file or tarball, which use checksums to ensure integrity of the download; start by sending a playlist which acts a file manifest and allows a user agent to download sub-objects in digestible chunks. However, a manifest has to express paths to related objects according to a filesystem which does not exist on the client, there has to be agreement between the client and server on how to interpret relative paths in a playlist. The problem is that few playlist formats -- only SMIL, to my knowledge -- define the meaning of relative paths in a playlist.
XSPF' solution: XSPF clearly defines the meaning of relative paths according to the rule that a client must interpret relative paths in a playlist according to the XML Base specification or IETF RFC 2396.
Scenario: There is a renderer which is capable of rendering one form of a media object but not another. The server is able to deliver the object in either format, but it needs to communicate URLs for both. Though HTTP content negotiation can be used for instances where the renderer contacts the server directly, it doesn't support protocol negotiation, and it can't be used in non-HTTP protocols.
Typical solution: This is particularly a problem for Real, which has a large installed base of obsolete software to be babied. The solution is to delver alternate URLs within the same playlist and allow the client to choose. The RAM format allows both a pnm: and a rtsp: URI within the same playlist, separated by a line containg the keyword "--stop--".
XSPF' solution: An XSPF track object can contain multiple identifiers or locations for the same media object.
Scenario: An MP3 player needs to access information about media objects which is too expensive to compute in real time. For a large number of file a user can't wait to re-read ID3 tags, computing SHA1 hashes, or perform a fourier transform for each.
Typical solution: An MP3 player computes the information once, the first time it encounters an object, then caches the data. The iTunes library format stores computed information like ID3 data in the global catalog and playlist.
XSPF' solution: XSPF defers this information to an external module called the content resolver, and mandates that the information not be included in shared playlists.
Scenario: A user needs information about high level concepts like artist and song title rather than machine-level concepts like file name and bit rate. How should artist and song title be communicated, and how should they be stored?
Typical solution: Derive the metadata according to an application-defined process like extracting ID3 tags, then then store a copy of the metadata in any playlists that reference a media object. The EXTINF property of the extended M3U format is used in this way.
XSPF' solution: XSPF defers this functionality to other sources. Metadata is hard; there are already many projects to deal with it, some of which are very good. Metadata is attached to an XSPF track according to whatever syntax an imported vocabulary defines. XML namespaces may be used, but the preferred syntax is the XSPF link and meta elements. (These elements allows us to validate metadata from external sources, while namespaces don't.)
Scenario: A businessperson wants to make a batch of videos of related talks from a conference because watching them in a shared context gives a deeper understanding of the subject as a whole.
Typical solution: A user compiles copies of the videos and puts them in the same location, maybe in the same directory on a web server, maybe in the same directory on a hard drive. The user then puts the locations, whether paths or URIs, into a file in the M3U format.
XSPF' solution: The XSPF trackList element contains a sequence of track elements, each of which points to one of the objects.
See the XML Base specification or IETF RFC 2396:
The rules for determining the base URI can be summarized as follows (highest priority to lowest):
- The base URI is embedded in the document's content.
- The base URI is that of the encapsulating entity (message, document, or none).
- The base URI is the URI used to retrieve the entity.
- The base URI is defined by the context of the application.
Robert Kaye has created a Relax NG schema for XSPF draft 8 at http://mayhem-chaos.net/stuff/xspf-draft8.rng. You can use Jing to invoke it.
For users of Emacs nxml-mode, Ryan Shaw has posted a .rnc version of Robert's schema at http://lists.musicbrainz.org/pipermail/playlist/2004-October/000429.html. This is just a matter of putting the .rnc file in the schema/ subdirectory of your nxml-mode installation. nxml-mode will find it automatically and add it to the list of available schemas; if you begin authoring an XSPF playlist, nxml-mode will choose the correct schema by examining the root element name.