Digital Public Space: Turning a big idea into a big thing
The remains of the Metroon. Photo by bokur.net, used under licence.
The story begins about 2,500 years ago, in Athens.
Around 500 BC in the Ancient Greek city-state of Athens the state archive was housed in a building called the Metroon, or ‘mother building’. This temple, dedicated to the goddess Demeter, was filled with papers relating to the day-to-day civic, legal, commercial and cultural life of its citizens.
The Metroon was open to every citizen, and all were entitled both to read and to make copies of anything held there, giving them a level of access to the building blocks of their society that is unrivalled in the modern age despite our Freedom of Information laws and open data initiatives.
Today, there are simply too many Metroons, even if we had permission to enter all of them. The vast majority of current archives remain undigitised and available only by visiting a physical building.
But the bigger challenge is that public archives are run by organisations and institutions that have collected, crafted and labelled their archives to meet their operational needs rather than general access. Gathering a comprehensive or authoritative set of materials from different archives becomes an Olympian task: there is simply so much stuff in so many places described and recorded in different ways using different systems.
Digitisation does not solve this problem, since the different databases are not necessarily compatible. Even though we are entering an age of mass digitisation, the challenges in achieving Athenian levels of access to digitised material are many, massive and varied.
Fortunately, the emergence of the semantic web and linked data practices and standards means that a modern Metroon – a digital public space – is, at last, a possibility.
Digital Public Space
I'm Jake Berger, the Programme Manger for the Digital Public Space project. Essentially this means that I have to work out and describe the scope and challenges of the overall vision, then try to break down the big challenges of the wider Digital Public Space project into smaller ones and work with my BBC colleagues and external partners, (like those mentioned by Bill earlier in the week) to get these smaller projects off the ground. I also try to make sure that our project’s thinking aligns with other thinking around the BBC and beyond
In summary, our ambition is to create an online space in which much of the UK’s publicly-held cultural and heritage media assets and data could be found - connected together, searchable, machine-readable, open, accessible, visible and usable in a way that allows individuals, institutions and machines to add additional material, meaning and context to each other’s media, indexed and tagged to the highest level of detail.
A couple of weeks ago, Bill Thompson talked to Jemima Kiss about the Digital Public Space project in the Guardian’s TechWeekly podcast. When Jemima Kiss described DPS as a big library Bill explained how the shared metadata would work:
The stuff’s not in the library. You have the best catalogue ever, and when you want something there’s an instant delivery service. Those organisations that want to keep their material in the library can do so. Those that want to keep it to themselves because they’re worried about rights issues or whatever can keep it to themselves and only make it available when they’re asked for it to people they’re sure will look after it.
Bill wrote earlier this week about how the BBC was working with partner organisations, and you can see a visualisation of how the partners, the catalogue, the assets, and the products and services all fit together on Bill’s post.
Data model and reference implementation
As Mo wrote, an ‘umbrella’ data model is being developed. This brings together a number of catalogues - data sets describing the holdings of a range of partners - classifying their contents in a consistent way, identifying themes and types, and mapping out connections and associations across diverse data sets.
The data model is lossless - it does not attempt to truncate or simplify any of the extensive detail within partners’ catalogues. There may be elements or fields within each catalogue that do not currently have an obvious connection to any other field in any other catalogue, but the ‘lossless’ approach ensures that any such connections can be found and mapped in the future.
So far the data model can encompass catalogue information from the BBC, British Library, The National Library Of Scotland, The Royal Opera House, Royal Botanic Gardens at Kew, The National Archives, The Science Museum and The Arts Council’s Arts On Film collection.
Early versions of this data model indicate that - as hoped - there will be many, varied and often unexpected journeys that can be made through these catalogues and the material they describe. For example, a user starting out by watching a film of a production of Macbeth from the Royal Opera House might then look at a scan of a rare musical manuscript from The National Archives, then browse similar manuscript scans held at the British Library, watch a clip from a BBC documentary about how paper was produced in Shakespeare's era, before ending up learning about the plants used to make the paper using information from The Royal Botanic Gardens At Kew. In a DPS, all of this could happen in the same online space.
Clearly, a ‘critical mass’ of data and material needs to be brought together before we see such innovative journeys emerge. So we are now beginning to assemble this critical mass.
Screen grab from a Metabroadcast prototype to help folk navigate a large dataset.
Dealing with Complexity
The project throws up a lot of complex questions and we have already started projects to address some of them.
Bill recently mentioned a collaboration between JISC and BBC Northern Ireland, which combined digitised video with tools to search and tag it for research or teaching.
There are a whole series of challenges around the user experience and navigation through vast and diverse data sets. We have worked with Metabroadcast to try out a couple of approaches to this
Mo blogged about the development of a web browser-based user interface, which navigates through these catalogues using the concepts of “people”, “places”, “events”, “things” and “collections”. I soon hope to share some other User Interfaces that we’ve developed within the Archive Development team
Of course these are only the beginning; soon we will launch projects with other collaborating institutions which will explore issues around rights, identity, access, privacy, provenance, persistence, user–generated content and data, augmentation and amplification.
As an insight into these issues, it will be a large task to administer huge numbers of rights holders when the current rights situation is so complex. Earlier this year a study by our Rights and Business Affairs Department, submitted to the Hargreaves Review, revealed an average of 85 separate rights-clearance transactions per episode of Doctor Who.
Another question is to track contributions, usage, and amendment whilst preserving the privacy of contributors. This leads to the question of provenance and how we can preserve this information over time.
When more projects to address these complex questions are off the ground, I will blog about them too.
Rome wasn’t built in a day. Neither was Athens, and nor will the Digital Public Space. But, I hope that the blueprint we are beginning to develop, the plan that will deliver it and the Digital Public Space itself are as valuable to every modern-day citizen as the Metroon was to the citizens of Athens.
I look forward to blogging again when these projects have delivered, as part of consulting our partners and audience over the next steps.
Jake Berger is the Programme Manager for Digital Public Space in the BBC Archive Development.