Wednesday 30 April 2014, 10:05
Hello, I’m Sofia Angeletou and I’m the Data Architect for the Linked Data Platform (LDP), which builds the BBC’s services for creating and publishing linked data.
I’m going to talk to you about our new /ontologies site which we released last week and where you can find the ontologies that BBC uses to support BBC Sport, Education, news prototypes and soon BBC Music and Radio programmes.
What is it and why are we doing it?
Oli Bartlett, the owner of the Linked Data Platform, has explained how we have expanded the reach of linked data within the BBC to more audience facing products and presented our ambitions to using linked data as glue for the plethora of content the BBC produces. As a direct result of this, more models are being built to support additional functionality and cover new and diverse domains of interest.
bbc.co.uk/ontologies is a human friendly view of the data models in the Linked Data Platform and is meant to give a comprehensive understanding of which ontologies the BBC uses, why and how. This is provided for members of the public and anyone who wants to get a better understanding of the BBC's Linked Data.
Those of you who have visited bbc.co.uk/ontologies before will immediately notice the different look, but the main changes lie beneath the presentation.
The previous /ontologies did not reflect the BBC’s work with Linked Data, and was updated in an ad-hoc manner. It had organically grown as a result of various projects and hosted the schemas used by applications to publish RDF such as /programmes and /nature; yet these applications were not built on a semantic stack, did not use a triplestore or SPARQL for data manipulation (the exception to this being the Sport ontology which has been used in the Linked Data Platform for bbc.co.uk/sport since 2010).
The Linked Data Platform team is now responsible for /ontologies as part of its wider goal to support the appropriate usage and management of ontologies. We want the BBC to continue being part of the Linked Open Data (LOD) ecosystem and to this end we have decided that /ontologies should reflect our work with Linked Data, be open and more transparent and make the first step towards opening up the BBC’s data as explained by the BBC's Director of Strategy and Digital James Purnell in this press release.
What does it contain and how does it work?
The models published in /ontologies live in our triplestore and are the basis for the Linked Data services offered the clients of the LDP platform. They fall into three main categories, models about the content, the reference data and the applications.
The content model is the Creative Work ontology which is used for content metadata such as an ID from the originating CMS and the date it was published. It is used to associate creative works with the things they are about and links to the human readable view of the creative work.
The domain ontologies are used to describe the things for which the BBC creates content, the reference data. These ontologies include the Core Concepts ontology which describes the main things the BBC talks about (People, Places, Events, Organisations and generic Topics), the Sport ontology which supports BBC Sport, the Curriculum ontology which supports BBC Education, and the Storyline ontology, built in collaboration with other media organisations which supports News connected stories pilot propositions. These ontologies tend to be owned by product teams (eg News or Sport) and not the LDP team.
Last, but not least, there are application ontologies which encode application logic such as the CMS ontology that supports the LDP interaction with content management systems, the Provenance ontology that helps us manage and audit data and the BBC ontology which describes the products, web documents and platforms for for which the BBC produces content.
The new /ontologies is hosted on Amazon AWS using the BBC’s cloud tools and uses the LDP APIs to obtain the ontologies from the live triplestore. This means that the ontologies you see are the exact same models used to support the BBC’s live websites. Aside from a minimal amount of manual documentation, all ontologies are self documenting such that the term documentation dynamically changes as the ontologies change in the store.
Another important feature is that all ontology changes will also be public. Each new version contains a change reason which justifies its existence. All previous versions are available and one can easily find the delta of the models.
The reason we did this is twofold. On the one hand, we work in an agile manner and our ontologies change very frequently but on the other hand we want to have an up-to-date public facing view of our ontologies. So we chose to publish all changes in order to avoid the maintenance overhead and the risk of having internal and external versions of the same ontology. The changes are usually raised from our clients to support additional functionality in their product, or they are raised by the LDP aiming to improve our services.
We have often been asked why we built our own models given that there’s already an enormous number of existing vocabularies out there. The main reason is the fact that small and controlled models are easier to change. Our approach to ontology building is iterative and incremental reflecting the same agile approach to how we build our products. Although at times it’s tempting to model the world, we try to keep the ontologies as minimal as possible. modelling only what's needed to meet particular requirements. We embrace fast failure over perfect first time, and the impact this has on our ontologies is that we will actively depreciate and ultimately remove terms that don't work for us. Frequently we have a long backlog of change requests that are often dependencies for functionality that needs to be rolled out immediately. Engaging with a large community to clarify the semantics of a widely adopted model is often not viable when speed is of the essence. In addition, the nuances of each use case might not be appropriately represented in an open vocabulary. Our primary goal is to deliver functionality for the live site quickly, alignment with the LOD world is a secondary goal which we address by providing mappings from our models to the popular vocabularies in the LOD cloud.
As I previously mentioned we view opening up our ontologies as the first step to opening up our data. The immediate next steps include providing more data about how the ontologies are used, for example through links to instance data. For example, which people, places, football teams and more does the BBC create content about. What would you like to see next?
Sofia Angeletou is the Data Architect for the Linked Data Platform, BBC Future Media
Join the discussion...
Tuesday 29 April 2014, 07:28
Wednesday 30 April 2014, 12:41