November 22, 2008, Saturday, 326

Haudio-case-study

From DBWiki

Jump to: navigation, search

Author: Manu Sporny

Contents

A Brief Introduction to Representing Music on the Web

The gold standard for identifying musical works stored by computers has been the ID3 tagging format used in MP3 audio files. This standard has stood the test of time for over a decade now, which is an eternity in the computing world. As we move our data off of our computers and onto the web, new methods of interacting with that data are being discovered. This essay focuses on new ways in which we can use the music data that we have on our computers and how we can express that data on the World Wide Web.

Our company, Digital Bazaar, runs a service called Bitmunk. The purpose of the service is to connect artists with fans and then help the fans distribute the artist's digital creations. More specifically, we provide an MP3 song catalog of over 50,000 independent musicians and over 850,000 songs. Once a fan buys an artist's album, they can re-distribute it on the artists behalf via the Web. This means that the fan can legally re-sell the artist's album through the fan's website or blog. The artist is paid a royalty and the fan is also paid a royalty for helping to distribute the work on behalf of the artist.

One of the biggest problems that we have is helping the fans find independent musicians that they like... recommending new bands is a very difficult task because the music selection process is a very chaotic one and it almost never happens on a single website. Friends, advertising, radio stations, blog loyalty, and many other factors play into what a fan likes and where they go to get it.

"Wouldn't it be neat if we could create something that could recommend music to somebody browsing the Web, without the requirement to visit a specific website?" we thought. "Wouldn't it be cool if your web browser could tell you more about an artist, their albums, and recommend places on the net that had the lowest price on those albums without having to leave the music blog that you're reading?". Basically, we wanted to make the web browser your personal music recommendation jockey - with extra smarts built in that web browsers don't have, but that exist somewhere on the web.

We needed several things to make this happen:

  • A way of representing music on a web page so that web browsers could recognize music albums and artists.
  • A method of creating a community standard for representing music.
  • Broad web-browser integration.
  • A music recommendation service.

What follows is the story on the path we took, the mistakes we made, the successes that we have enjoyed and hopefully, some guidance for others trying to do the same thing in their industry.

The Semantic Web

There are many different ideas of what the Semantic Web is and what it can accomplish for human kind in the next 20 years. Some believe that there will be a Semantic Web that runs along side the World Wide Web. Where the World Wide Web is for people to read and understand, the Semantic Web will be for computers to read and understand. Both Webs will talk about the same thing, but in a way that their respective readers can understand - much like most Westerners read newspapers written in English and most Japanese read newspapers written in Japanese. Others believe that the current World Wide Web will morph into the Semantic Web by inserting machine-readable bits and pieces into the web pages that we are accustomed to today.

While there are many theories and scenarios that academics have postulated, the only question that we were interested in answering was this:

"How can we teach a computer how to recognize a song and an artist on a web page, and how do we do it in a way that regular web publishers can understand?"

A Loose Collection of Possibilities

When we started to look at technologies and projects that could help us achieve our goals, we had several requirements in mind:

  • Simplicity - The end solution had to be simple to understand, implement and author. This would help rapid adoption of the technology.
  • Standardization - The mark-up of music metadata had to be a standardized format because we wanted wide adoption of the standard, aside from it being The Right Thing To Do.
  • Distributed Innovation - The implementation mechanism had to be distributed in nature as we wanted to ensure innovation and survival of the standard outside of our company.
  • Re-use - We would re-use grammars, data, technology and concepts that were created before us when possible, as we wanted to be focused on the end-result and not theoretical debates on what the future might hold. We had a specific problem that needed to be solved and would choose to focus on that and not the latest technology bandwagon.

With those ideas in mind the following technologies and projects were identified as interesting:

Finding a Community Standards Process that Works

The first thing that we wanted to focus on was the problem of creating a standard way of representing music on the Web. There are several Web standards organizations that we considered. Among them were the Internet Engineering Task Force (IETF), The World Wide Web Consortium (W3C), OASIS, and a new community-based standards creation process called Microformats. Our primary goal was to avoid any sort of red-tape and member-fees, we didn't quite know what the standard was going to look like after it was created, and we wanted the ability to make rapid changes in the first year or two because we knew we were going to make mistakes. Most importantly, we wanted input and adoption by a large number of web publishers.

The IETF, W3C and OASIS didn't make the cut due to the amount of red-tape and politics that we had heard about in those organizations. Granted, we had no experience with those organizations, but we knew enough to know that you can't create a standard with those bodies in less than a year. In other words - our choice was very pragmatic - we wanted a quick solution, created by a community that cares about website publishers first and foremost.

We had heard about a fairly new community forming around a concept called Microformats. This community stated that they put publishers first, have a community-based standards creation process, and enable the semantic web through a very simple mechanism. We decided to explore the Microformats community and process in a bit more depth.

Working with the Microformats Community

After more research, we discovered that the Microformats process solved several of our problems:

  • Implementation of Microformats are very simple, and publisher friendly.
  • There was a process in place for creating new Microformats.
  • The community was open to all.
  • It would allow us to rapidly tune the music Microformat for the first year or two if we made a mistake.
  • Microformats focus on actual problems and a scientific process for creating new standards.
  • No politics, red-tape or anything else to keep good people with good ideas from participating.

We were elated to find that such a community existed and that they had already started an initiative called media-info to semantically mark up video, music and images on the Web. After some discussion on the Microformats new mailing list, we decided to split off and take the initiative on creating the music part of the media-info project.

This was called the hAudio Microformat and we began in earnest following the Microformats process to come up with a grammar that would solve many of the woes experienced by website publishers.

Creating the hAudio Grammar

The Microformats process is a scientific process that requires one to gather examples, analyze those examples, publish trends and patterns and finally formalize a grammar that is currently being used for the type of data being published via the Web.

For the music grammar, we gathered over 100 examples and analyzed each website, looking for properties such as "artist", or "album name", or "publisher" in each page. All analysis was recorded by the community through the audio-info-examples page on the Microformats wiki. We also looked at all the music metadata file formats that existed in early 2007. From all of this data, we found several trends and patterns and were able to demonstrate a need for a grammar based on hard publishing data on the Web.

This approach is an important one to recognize because it leads to very few theoretical debates on what is and what isn't useful. In short, if you can't find examples of data being published on the Web, there is no reason to create a tag for that data because there is nothing to mark up. One example of this is that we went into the process thinking that a rating tag would be needed, but after analyzing all of the websites, we found that only around 10% of websites publish ratings for songs or albums.

This process helped us create a very minimal grammar that could be backed up by hard data called the hAudio Microformat. We provided a collection Microformat to represent music albums as well called the hAlbum Microformat. Most of the work for hAudio was completed in 3 months, at the end of which we had a preliminary draft - which is impressive for any standards body.

Problems Encountered with Microformats

Following the Microformats process was not without its problems and frustrations. Like any new community, some kinks are still being worked out. Some of those kinks include:

  • The Process - Mixed interpretations of the Microformats process lead to arguments about the process. These arguments distract new Microformat authors from their work. For example, not everybody agrees on what constitutes "enough examples", or "proper interpretation of W3C standards". The process is constantly being refined, and in some cases re-written, which is frustrating to those of us trying to follow that process.
  • Politics and The Cabal - There are several involved in the community that are bitter about how the community goes about making decisions. Some of the founders of Microformats, sometimes called The Cabal, quite blatantly go against some of their own rules and at times it seems as if rules are being applied to The Cabal differently than the rest of the community.

While the above are being worked out and will probably be resolved in the long term, there were two things that concerned us and will probably never be resolved due to the philosophy employed by the Microformats community:

  • Microformats Implementation - The Microformats implementation is very simple, which is a double-edged sword. Doing advanced mixing and matching of various Microformats are very difficult because Microformats are both scope-less and have no namespaces. This means that if you were to list an album with an artist and a track inside that album which is sung by another artist - the Microformat won't be able to differentiate which artist did the album and which one did the song. You also can't link an artist at the top of a web page with an album at the bottom of a web page if there is any other album listed on the web page. While these might seem like corner cases, they are far more common that they might seem at first.
  • Being a Visionary - Microformats do not allow you to add anything to them that isn't backed up by hard publishing data. The side effect of this is that there will always be a group of people that need to publish higher fidelity data in a standards-compliant way than is supported by the Microformat.

It is important to note that while the above two are issues that are problems for us, we shouldn't forget that the Microformats community solved one very important problem for us: It allowed us to create an Audio Microformat in about three months that got us 80% of the way towards solving the web's music publishing problem... and that is an impressive feat for any community.

RDFa To The Rescue

Our biggest problem with the hAudio Microformat implementation was that it was nearly impossible to mix and match audio, video and image Microformats. This was partly due to the no namespacing, no-scoping problems mentioned earlier and also because grammar elements from other Microformats are re-used quite heavily between Microformats. For a real-world example, see the Appendix at the end of this article.

While the hAudio Microformat would solve the music mark up problem for most of the website publishers out there, we wanted to also ensure that others would be able to extend the hAudio format. Microformats can lead to a catch-22 situation. If there is not enough people publishing a property, such as rating, that property will not make it into the Microformat. If there is no way to publish the rating property, publishers will probably continue to not publish that property.

Our primary problems with Microformats were that the technical implementation was a bit shoddy and that the resulting Microformat was only good for the masses and not for the visionaries. Luckily, there was an initiative that we found out about through our colleagues in Creative Commons and through our colleages working on Firefox called RDFa that solved both of these problems.

RDFa is a solid, light-weight, technology solution for semantic data markup. It supports scoping and namespacing without adding any complexity. It would allow us to map the hAudio Microformat to RDFa and implement web pages in a very clean way. It would also allow us to improve the hAudio Microformat without the need for hard data to back up the need. If two or more websites wanted to add one or two properties to hAudio they could do so with hAudio RDFa.

In the end, we decided to use the Microformats process to create a basic grammar for music called hAudio and hAlbum. We would implement hAudio/hAlbum using Microformats and using RDFa, preferring RDFa as the final implementation solution. We would then build on top of the hAudio/hAlbum RDFa implementation to include other important, but less frequently published properties, such as ISRC number.

Building an Open Music Recommendation Web Service

Believe it or not, the easiest part of the entire process was creating a basic music recommendation Web Service. We implemented this on our primary music website, Bitmunk. Anybody that wants to use the Bitmunk music recommendation web service POSTs an XML document, optionally marked up using either FOAF or hCard to a recommendation web service URL. The result is another XML document containing hAlbum items whose artist is similar to the one that was posted to the recommendation web service URL.

Here is an outline of what happens:

  1. Your web browser detects an audio album and artist on a web page and notifies you.
  2. You instruct your web browser that you would like some albums recommended to you that sound similar to the one you're reading about.
  3. The hAudio/hAlbum Microformat/RDFa is lifted from the web page and sent to the Bitmunk music recommendation web service.
  4. The Bitmunk music recommendation web service returns a number of hAlbum items that are similar to the one your web browser posted.
  5. The browser performs some action, such as looking up the current prices of those albums on your favorite music purchasing website.

Working with Firefox and Songbird to Integrate hAudio

We were fortunate enough to have cultivated a good working relationship with grade-A nice guys and primary Microformats implementors for Firefox 3, Mike Kaply and Alex Faaborg. We had also been chatting with the wonderful folks over at Pioneers of the Inevitable, creators of the media browser Songbird and the cool cats at the Participatory Culture Foundation, creators of the Miro video player. They had all shown a strong interest in coming up with semantic media formats for Firefox, Songbird and Miro. With a great deal of help from Ben Adida, of Creative Commons and the rest of the RDFa Task Force at the W3C, we were able to create a preliminary implementation of the hAudio/hAlbum Microformat and hAudio/hAlbum RDFa in Firefox 2 using Operator.

Future

THIS SECTION STILL NEEDS TO BE COMPLETED.

This section hasn't been completed because we're still working on the hAudio Microformat and hAudio RDFa. Namely:

  • Polishing up hAudio.
  • Moving hAudio to an official Microformats draft.
  • Refining hAudio RDFa.

We're working towards a unified semantic data pipeline for Firefox 3. The data pipeline will be able to parse eRDF, RDFa and Microformats, and hand them to "Action" plugins that perform different actions based on semantic data that is found on a web page. The entire implementation will be in Javascript, allowing any ECMA-compliant web-browser to use the semantic data pipeline.

Acknowledgements

The author would like to thank Ben Adida (RDFa, Creative Commons and W3C), Mike Kaply (Operator/Firefox), David Lehn (hAudio implementation), Steve Krulewitz (Songbird), Mig (Songbird), Rob Lord(Songbird), Ian McKellar(Songbird), and Nick Nassar(Miro). A special thanks to the folks over at Microformats.org for helping to create the hAudio Microformat. Namely, in order of contribution, Mary Hodder, Martin McEvoy, Alexandre Van De Sande, Michael Johnson, Dave Longley, Brian Suda, Ben Wiley Sittler, Scott Reynen, Frances Berriman, James Craig, David Janes, Andy Mabbett, Danny Ayers, Rudy Desjardins, Edward O'Connor, Ryan King, Chris Griego, Brad Hafichuk, Tantek Çelik, Colin Barrett, Joe Andrieu and David Lehn.

Appendix

Appendix A - Microformat Scoping Issue

A good real-world example of the Microformats Scoping issue is outlined in the following section. Let's take the following text and try to mark it up using the hCard Microformat:

Janet Seymour and Robert Tripton are available via phone.
Rob is available at his home number: 555-555-5555.
Janet can be reached at her work number: 777-777-7777.

A publisher would want to mark-up Janet and Robert's telephone contact information. Not knowing that Microformats are scope-less, they might take the following approach:

<div class="vcard">
  <span class="fn">Janet Seymour</span> and
  <div class="vcard">
    <span class="fn">Robert Tripton</span> are available via phone.
    Rob is available at his home number: <span class="tel">555-555-5555</span>.
  </div>
  Janet can be reached at her work number: <span class="tel">777-777-7777</span>.
</div>

Unfortunately, the Microformat parser would generate the following two Microformat outputs:

hCard
  fn  -> Janet Seymour
  tel -> 555-555-5555
  tel -> 777-777-7777
hCard
  fn  -> Robert Tripton
  tel -> 555-555-5555

The reason this would happen is because both hCards overlap. Microformats do not have an established mechanism for identifying which properties go with which VCARD. When it parses Janet Seymour, the first telephone number it finds is Robert Tripton's, not Janet Seymour's. Since the parser doesn't know which person 555-555-5555 belongs to, it mistakenly adds it to Janet Seymour's list of phone numbers. This problem happens whenever you have more than one Microformat of the same type that overlap one another.

Keep in mind that the Microformat authors are aware of this limitation, and that the limitation is fine for what the Microformats community is attempting to accomplish - provide a simple mechanism for semantic data markup. Simplicity has its benefits and its drawbacks. RDFa allows you to specify which property goes with which VCARD and solves this issue, at the added cost of a small amount of extra syntax.

Appendix B - Microformat Namespacing Issue

The following is a theoretical example, based on work performed for hAudio, that shows how not having namespaces can be detrimental to scalability. There was a discussion on the New Microformats mailing list regarding the re-use of the 'title' property from hCard. Re-use of property names in Microformats is heavily encouraged, but it creates several problems:

  1. If somebody narrowly defines the meaning of a property, re-using that property is difficult if not impossible.
  2. The more you re-use a property, the more risk there is of property name clashes when Microformats overlap on a web page.

The question of whether or not we should be using title when referring to the title of a song, or the title of an album, was raised several times while discussing hAudio. The definition for the word 'title' in this sense is:

5 a) the distinguishing name of a written, printed, or filmed production 
  b) a similar distinguishing name of a musical composition or a work of art.

Unfortunately, the hCard authors narrowly defined title. They re-used the VCARD definition. The VCARD specification defines title as:

To specify the job title, functional position or function of the object the vCard represents".

This means that any Microformat that desires to use title must use the definition used by VCARD, since that specification was created first and changing it might confuse people who have adopted VCARD. Having a namespace would fix this problem as "hCard:title" could mean something subtly different from "hAudio:title" - however, since there are no namespaces in Microformats, this distinction cannot be made.

Let us assume for a moment that "title" was defined in such a way that would have allowed hAudio to re-use the property. There is another problem that goes back to the example given in Appendix A. How do you differentiate between two Microformats that overlap with the same property name? Take this sentence for example:

Freddie Mercury, known for a song called Bohemian Rhapsody, was the lead singer for Queen.

The HTML markup would look like this:

<div class="vcard">
  <span class="fn">Freddie Mercury</span>, 
  known for a song called
  <div class="haudio">
    <span class="title">Bohemian Rhapsody</span>,
  </div>
  was the <span class="title">lead singer</span>
  for <span class="org">Queen</span>.
</div>

The Microformat parser above would generate the following two outputs:

hCard
  fn    -> Freddie Mercury
  title -> Bohemian Rhapsody
  org   -> Queen
hAudio
  title -> Bohemian Rhapsody

Even though title is the correct word to use, due to the scoping issue explained in Appendix A, and because Microformats do not have namespaces, we were forced to use a different property name. hAudio currently uses the property name audio-title and album-title in an effort to avoid namespace and scoping issues. Ironically, the use of the dash character in "audio-title" and "album-title" is a very simple form of namespacing... something that Microformats were attempting to avoid.

This limitation is also well known to the Microformats authors and is currently viewed as an acceptable price to pay for an easy-to-implement markup mechanism for publishers. RDFa has namespacing and thus solves this issue, at the added cost of a small amount of extra syntax.