In Search of Cultural Identifiers
A Rainbow of Books by Dawn Endico. Some rights reserved.
Late last year we got quite excited about Open Library. Using the open word always seems to tick our boxes. We chatted about the prospect of a comprehensive, coherent BBC books site heavily interlinked with BBC programmes. Every dramatisation of a novel, every poetry reading, every author interview and profile, every play linked to / from programmes. The prospect of new user journeys from programme episode to book to author to poem and back to episode still seems enticing. We started to wonder if we could use Open Library as the backbone of this new service in the same way we use MusicBrainz open data as the backbone of /music.
Unfortunately when we looked more closely an obvious problem came to light.
Open Libary is based on Amazon book data and Amazon is based on products. And the BBC isn't all that interested in products. Neither are users.
If I tell someone that I'm reading Crash they generally don't care if I'm reading this version or this version or this version. What's interesting isn't the product but the cultural artifact. It's the same story with programmes. Radio 7's David Copperfield isn't a dramatisation of this or this or this, it's a dramatisation of this - the abstract cultural artifact or work.
The problem is probably so obvious it hardly warrants a blog post but now I've started... Lots of websites exist to shift products. So when they're created the developers model products not looser cultural artifacts. And because the cultural artifact isn't modelled it doesn't have a URL, isn't aggregatable and can't be pointed at. As Tom Coates pointed out people use links to explain, disambiguate and clarify meaning. If something isn't given a URL it doesn't exist in the vocabulary of the web.
The problem is compounded by Amazon encouraging users to annotate it's products with comments, tags and ratings. Why is the Penguin version of Passage to India rated 5 stars whilst the Penguin Classic version is rated 3 stars? They're essentially the same thing, just differently packaged. Are users really judging the books by their covers? Anyway it all leads to conversations which should be about cultural artifacts fragmenting into conversations about products. It also leads to a dilution of google juice / page rank as user attention gets split across these products.
I'm no library science expert but speaking to more library minded friends and colleagues it seems they use 3 levels of identification:
- The Dewey Decimal system is used for general categorisation and classification.
- The ISBN is used to identify a specific publication.
- The bar code they scan when you take out a book is used to identify the individual physical item.
So there's something missing between the general classification schemas and the individual publication. Like Amazon, libraries have no means of identifying the abstract cultural artifact or work - only instantiations of that work in the form of publications. These publications map almost exactly to Amazon products
and since Open Library is built on Amazon data [I]t's why we see 45 different Wide Sargasso Sea's in Open Library.
So whilst Open Library's strapline is 'a page per book' (which feels strangely familiar) in reality it's a page per publication / product. It would be interesting to know if Open Library have any plans to allow users to group these publications into cultural artifacts. If they do then we'd really end up with one page per book and one canonical URL to identify it. At which point the prospect of links to and from BBC programmes (and Wikipedia / DBpedia) gets really interesting.
So we've written in the past about using MusicBrainz as the data backbone for the new /music site. MusicBrainz models 3 main things (artists, releases and tracks) and provides web-scale identifiers for each. So why have we chosen to only expose artist pages? Why not a page per release or a page per track?
The problem is the same one as Amazon / Open Library. In the case of releases MusicBrainz models individual publications of a release. So instead of being able to identify and point to a single Rubber Soul you can point to this one or this one or this one. And in the case of tracks MusicBrainz really models audio signals. So this Wonderwall is different to this Wonderwall is different to this Wonderwall with no means of identifying them as the same song - for all we know they might have as much in common as Statement by Extreme Noise Terror and I Should Be So Lucky by Kylie. Which isn't a problem except that we want to say this programme played this song by this performer - which performance / mix is much less interesting. Same with reviews - most of the time we're not reviewing a publication but a cultural artifact.
So how do we get round this? We're currently working with MusicBrainz to implement the first part of its Next Generation Schema. This will allow users to group individual release publications into what we're calling cultural releases. So we'll have one Rubber Soul to point at. After that it's on to works and parts and acts and scenes and acts of composition etc with a single page and a single URL for each.
Again the problem resurfaces in the world of programmes. Most of our internal production systems deal with media assets and these assets aren't always grouped into cultural artifacts. But people outside the BBC aren't really interested in assets. If your friend recommends an episode of Horizon you're unlikely to care if they mean the slightly edited version, the version with sign language or the version with subtitles. Most of the time people talk about the abstract, platonic ideal of the programme.
Way back in time when /programmes was still known as PIPs a design decision was made to model both cultural artifacts and instantiations. If you look at the /programmes schema you'll see both programmes (brands, series and episodes - the cultural artifact) and versions (the specific instantiation). When we talk about one page per programme what we're really talking about is one page per episode. What we're definitely not talking about is one page per version or one page per broadcast.
Getting this stuff right is really the first job in any web project. Identify the objects you want to talk about and model the relations between those objects. The key point is to ensure the things you model map to user's mental models of the world. User centric design starts here and if you choose to model and expose things that users can't easily comprehend no amount of product requirements or personaes or storyboards will help you out.
For want of a better label we A&M types often refer to this work as 'cultural identifiers'. One identifier, one URL, one page per cultural artifact, all interlinked. It's something that Wikipedia does better than anyone. One page per concept, one concept per page. bbc.co.uk could be a much more pleasant place to be if we can build something similar for the BBC.