« Previous | Main | Next »

BBC Digital Public Space project

Post categories:

Mo McRoberts Mo McRoberts | 10:00 UK time, Tuesday, 19 April 2011

(Editor's note: It's a delight to welcome Mo to the blog with this, his first "official" posting).

Yesterday, the BBC's Director of Future Media, Ralph Rivera, gave a speech at the newly-opened offices of the World Wide Web Consortium (W3C) in Oxford.

His speech emphasized the BBC's support for the organisation and its philosophies in the context of the BBC's work on a 'new broadcasting system' that can reach everyone, is free at the point of use and makes BBC programmes available to all those who can benefit from them. The speech also discussed the ways the BBC is seeking to get the maximum value from its archive and asked the audience 'what good is it to retain this archive it if can't be shared?' before describing the 'digital public space' within which the BBC now sees itself as operating as it delivers its services online.

As Ralph noted, the digital public space can mean different things to different people. To some it's a philosophical ideal, the belief that UK citizens have the right to access and interact with the countries social and cultural assets online. To me in my role as Data Analyst within the BBC's small Archive Development team it's something very specific.

I and a couple of colleagues work on the Digital Public Space project. This is a partnership between the BBC and other cultural institutions in the UK, including museums, archives, libraries, galleries and educational bodies, all of whom share a vision of not simply using Internet technology as a distribution channel, but instead being part of that digital environment as it evolves: being part of the Web, rather than just on it.

It aims to be an access point for all of the UK's cultural archives, marrying together both the rich information which has been carefully collated, checked and double-checked over the years by experts in their respective fields, with the more immediately-accessible higher level information and audio-visual material, both from the partners and around the Web.

The first step along the way in achieving this is a prototype which is being developed that brings together the archives and catalogues of some of the partnering institutions (including the BBC's) within an 'Umbrella' data model and creates a platform on which applications and interfaces for navigating, annotating and curating them can be built. Eventually, you would be able to access and add to this information through an online gateway, but there could also be specialist entry-points.

For example, there might be an iPhone or Android app for exploring the history of your local area, or a YouView interface focussed on "British Ballet". Part of what makes the project so exciting is that we really don't know what kinds of interfaces and applications will end up being developed for the platform.

The Semantic Web lies at the very heart of this. It provides the toolkit for describing real-world things in a machine-readable way, just like ordinary web pages describe those things in a human-readable way. Like the "Web of documents" we are generally used to, the Semantic Web is built on the fundamental principle that anybody can publish anything about anything else, without having to go through layers of bureaucracy and paperwork. Even the language used to describe these things -- RDF -- uses vocabularies which are often developed independently of one another, and come into existence by being published somewhere on the Web, and having RDF documents begin to use them. There is no central "ontology authority" who decides what does and doesn't form part of the Semantic Web's vocabulary: if there isn't an ontology in existence which is able to describe the things you need to describe, there's not much, beyond time and effort, standing in the way of you creating one.

Within the digital public space prototype, RDF gives us a common language that institutions can use to describe their catalogues in their own terms. The prototype aggregates these catalogues, finding areas of overlap, and presenting the things described by them in a unified manner, not organised in terms of the catalogue entries that are best suited to archivists, but instead in terms of the people, places, events, things and collections which those entries describe.

First and foremost, the aggregated information is itself published as RDF. Being intended for consumption by software, RDF isn't terribly exciting for most people to look at, so as part of the prototype we're also developing a number of user interfaces to explore different ways in which the catalogues can be navigated.

The aggregation engine doesn't have any special knowledge about the partnering catalogues, though. As far as it's concerned, there's no fundamental difference between an expert institution and anybody else. There's a language for making statements about things (RDF), a way of identifying the things in the catalogues (URIs -- of which what we know as "Web addresses" are a subset), and a way to publish those things (the Web).

There are some practical hurdles to be overcome, however.

With institutions, it's quite easy to mandate that the software that feeds catalogue information to the aggregator must push RDF documents to a RESTful Web service, using a digital certificate which provides a strong identity so that the information can be attributed to them. For individuals, things get a little more complicated. We know that user interfaces can be built to take care of the heavy lifting of generating RDF and pushing it to the aggregator, but that still leaves problems with certificates -- most people don't really use public-key cryptography on a day-to-day basis, and so we need to settle upon an approach to identity that everybody can get to grips with.

Beyond that, there are aspects of RDF which haven't been finalised yet -- attaching digital signatures to different parts of an RDF document, and specifying the source of a set of statements ("named graphs"). With all of these issues, we're looking forward to working with the Web community to find solutions.

You're probably wondering when you'll get to experience the digital public space, and in particular this prototype. The answer is "it depends". This phase of the project is due to end in June, at which point we will have something tangible that can be shared amongst select individuals in the partnering organisations, to act as a proof of concept. While the details have yet to be finalised, we hope that the next stage after that will be to make it available to everybody in each of those organisations on a permanent basis. If that's successful, then we are looking to open it up to the many schools, colleges and universities in the UK.

As you can imagine, the legal and rights issues surrounding both the catalogue information and associated digital media are complex and varied, and navigating them means working closely with rightsholders and industry bodies, and will take some time. However, the BBC remains committed to the aim set out in Putting Quality First ("Opening up the BBC's library of programmes") -- and this is a vision shared by all of the project partners -- of providing permanent access to the UK's cultural archives in a digital environment that's available to everybody.

We know that the digital public space can only become a reality if we build on open technologies and standards as championed by the W3C -- the digital environment in which we're creating this already exists, and so co-operation and partnership is absolutely key to the success of the project.

Mo McRoberts is Data Analyst, BBC


  • Comment number 1.

    Mr. McRoberts,

    I wish you well, but envy you not in your pursuit of this Grail. To the extent that I am aware, each attempt at machine comprehension outside of a limited domain has been unsuccessful. My reading of the Semantic Web is that such comprehension is assumed. Perhaps I remain stuck on the exceptions of meaning rather than a greater body of success, but then how are exception conditions resolved?

  • Comment number 2.

    Hi Chryses,

    You're correct that broad-scale machine comprehension isn't exactly at our fingertips at the moment, and that the Semantic Web has that as the eventual “big picture” — that one day, machines will be able to understand the things described in RDF just as we can describe narratives written in natural language in Web pages.

    However, machine comprehension is by no means a prerequisite for all kinds of useful applications to be built on top of the technologies before that goal is attained (if it ever is). The tools — HTTP, URIs and RDF primarily — provide the means to allow navigation and description of things, and they can be used right now to feed data into interfaces which have a certain degree of prior knowledge about the way that those things are described.

    For example, you might build an interface which can interpret and present the Dublin Core Metadata Element Set along with FRBR, the Theatre Ontology and FOAF so that you can explore theatrical productions and the people involved in them.

    Some of the user interfaces being developed as part of the prototype will undoubtedly take this “narrow, but deep” approach — focussed on making specific areas of the aggregated catalogues accessible in domain-optimised manner. Others might take a “wide, but shallow” approach, where you can explore everything, but not at a particularly intricate level of domain-specific detail.

    So, while it would be lovely if Semantic Web browsers with inference capabilities were as prevalent as the ordinary Web browser is today, and the Digital Public Space prototype (at the very least) would work well with them if that was the case, it's not required for the project to succeed.

  • Comment number 3.

    Building user interfaces to deal with all that content is in itself an incredibly intriguing task. Nice musings.

  • Comment number 4.

    "that can reach everyone, is free at the point of use and makes BBC programmes available to all those who can benefit from them."

    This is a slightly different way of putting the role of the BBC as enshrined in its "Constitution" But usually it adds after "all".. "in the UK".

    How is the free at point of use to be funded for the users NOT 'in the uk'

  • Comment number 5.

    @cping500 (#4): at the moment there are no firm plans for when this might be available to everyone in the UK, let alone those outside of it (even the academic access scenario mentioned in the post is something we'd like and are hoping to do, but is only the earliest of stages of discussions right now).

    In short, we'd have to cross that bridge when we got to it, if we got to it.

  • Comment number 6.

    Mr. McRoberts,

    Thanks for the links; interesting ideas and ideals, if slow reading. In re the “narrow, but deep” approach, my wife is a librarian at a medical library, so the interoperability between the various medical databases in response to medical inquiries springs readily to my mind. These are both very deep and their domains are both limited and distinct. These seem to me to be a repository of knowledge that should require only modest amounts of metadata to interconnect. Proprietary issues might be a bit of a problem though.

  • Comment number 7.

    @Chryses (#6): Indeed, that's exactly the kind of thing which would make the whole project immensely valuable. At this point we're focussed on cultural archives, but there's no reason why the technologies won't be applicable to other pursuits given time — and, just as we have to navigate the rights issues involved in archive catalogues and content, there's nothing to say that the approaches taken as a result wouldn't also be applicable elsewhere.

  • Comment number 8.

    Hi Mo,

    Sounds fascinating and challenging! I guess it's early days but do you know if there are any plans to open up access to the data for the wider developer community to create applications/interfaces on a non commercial basis? I would love to do something with that lovely content ;-)


More from this blog...

BBC © 2014 The BBC is not responsible for the content of external sites. Read more.

This page is best viewed in an up-to-date web browser with style sheets (CSS) enabled. While you will be able to view the content of this page in your current browser, you will not be able to get the full visual experience. Please consider upgrading your browser software or enabling style sheets (CSS) if you are able to do so.