Main content

Linked data: Connecting together the BBC's online content

Oliver Bartlett

Product Manager

Hi I’m Oli Bartlett, product manager for the BBC's Linked Data Platform.

The Linked Data Platform is one of the legacies of the BBC Sport 2012 Olympics website. You may have read my blog post on the work we did for the Olympic Data Service.

One aspect of the service delivered the semantic framework for the 10,000 athlete pages and a page per event, discipline, country and venue.

This framework provides the semantic graph of data (the linked data containing the athletes, events and venues and their associations with each other) and the APIs on this data.

It was all built on the Dynamic Semantic Publishing (DSP) platform which facilitates the publication of automated metadata driven web pages and had originally been developed for the football World Cup in 2010.

The Linked Data Platform

The Linked Data Platform is the natural evolution of DSP. It builds on the idea of applying semantic tags to News and Sport articles by allowing tagging of any BBC content.

Also it provides the processes and tools needed to store and query linked data in any subject which may be of interest to the BBC from the original sport data (football and Olympics) to the UK schools’ curricula, politics, nature, music and more.

I'm getting a little ahead of myself. It will do these things but we're not quite there yet. Let's wind back a bit.

Every day the BBC creates thousands of creative works for its website including news articles, television and radio programmes, blog posts, recipes and learning resources to name a few.

Each of these content types is stored in a content management system (CMS) along with its own set of metadata appropriate for how the content is used.

For example we store the contributors to TV programmes in our programmes database, PIPS, the published date of News articles in our News Content Production System and the cuisine for a recipe in the recipes database.

The evolution of a CMS tends to be driven by the audience facing products it powers, so new metadata or features will be added as needed by its partner product(s).

This isn't unusual for a web platform and works perfectly well for our products. It does however mean that a CMS doesn’t tend to be optimised for making the content more accessible to other products and services in the BBC or to the wider web.

The Linked Data Platform stores a generic metadata model for each creative work published along with its semantic tags.

This model contains useful attributes which are common across all types of content which makes it much easier to build products which can combine content from different systems.

This in turn allows producers to focus on the stories they’re trying to tell by using the most appropriate content from across the BBC.

As we create more and more content it becomes harder and harder to keep track of everything that's there, for both our audience and producers alike.

Individual CMSs are pretty good at keeping tabs on the content they create but if you wanted to get hold of the 20 most recent pieces of content from across the BBC (and hence across CMSs) on Burkina Faso, or Jarvis Cocker or global warming it would be very tricky.

Free text searching (like a Google search) would get you so far but would probably return false-positives too (it can’t disambiguate between Shakespeare and Shakespeare).

It would be even harder (or maybe impossible) to do something more complex like find the latest 20 pieces of content about environmental issues or people who have been in northern Britpop bands. Here text searching starts to become significantly less powerful.

The primary goal of the Linked Data Platform is to make sense of all the BBC's creative works and provide an API to allow the retrieval of any creative work about any 'thing', with the added benefit that we hold a semantic graph of data behind the 'things'.

This means the platform doesn't just know that tomorrow's episode of the Culture Show features Jarvis Cocker. It also knows that Jarvis is from Sheffield, was the lead singer in Pulp, that Pulp were a Britpop band, that they had a single called Common People, and that Common People was played on 6 Music this morning.

We can suddenly start to make connections across our products and content which couldn't previously have been made without a lot of manual effort.

We're only starting to touch the surface of what new stories we'll be able to tell as we build up the content and data in the platform. Matt Shearer has written about how News are looking into new ways of curating content and providing journeys through linked data.

Currently we're working on getting data and content around music, politics, learning and sport (beyond football and the Olympics) into the platform. We're also investigating ways in which to get archived content into the platform as there are huge opportunities for publishing this in new ways using linked data.

If you have any questions or thoughts that you’d like to feedback then please leave a comment below.

Oliver Bartlett is product manager, BBC Linked Data Platform.

More Posts


What's on BBC Red Button: 16-23 February 2013


BBC Sport app launched on Android