Hi I work with BBC News FM as a data architect, looking at ways that we can use linked data to help users discover relevant News stories quickly and easily on the BBC website.

My colleague Matt Shearer has blogged about some of the prototyping that we have been doing and Oli Bartlett has explained the underlying technology platform that we will be using in the production environment.

In this post I want to talk about how we are starting to embed machine-readable metadata (called RDFa) in the HTML source code of pages on the BBC News website.

Core News Ontology v1

Please follow this link for a more detailed view of the above image.

This will help users to find news content about the stories they want to know about and ultimately help to open up references to the data contained in those stories.

The metadata is derived from information added by journalists at content creation time. Although this additional code does not directly impact the user's experience of the page, it makes a big difference to machines, helping them to ‘understand’ that (for example) this page

  • Is a news article
  • Has the headline Results for Anglesey
  • Was published at 10:30 on April 29 2013
  • Was published by the BBC

This set of statements can be derived automatically from the content management system that journalists use to create articles, but we can go further.

By journalists annotating articles with statements about the real-world entities that the article mentions or is about and using Uniform Resource Identifiers (URIs) to refer to those entities in additional RDFa statements, we can publish Linked Open Data with every page of the BBC News website.

Using the above example again we state:

  • The article is about &lta BBC unique identifier for Anglesey County Council&gt
  • &ltthis BBC unique identifier for Anglesey County Council&gt is the same as &ltdbpedia.org/resource/Isle_of_Anglesey_County_Council&gt

To make this metadata available we have used a vocabulary published by the International Press Telecommunications Council (IPTC), a news industry body that works on standards for improved news exchange.

The model is called rNews and provides a simple way of expressing these and other facts about news content in RDFa.

Additionally the rNews vocabulary has been mapped against the widely-used schema.org metadata vocabulary proposed by Google, bing and others.


Initially we have added the automatically generated RDFa statements to all BBC News pages with the intention of assisting search engines and social media sites in their display of BBC News content.

An immediate benefit of this work will be that search engines will be better able to display links to BBC News stories, helping our users to find relevant stories and to determine from a search engine's listings the relevance of the article to the story that they are searching for.

The types of things that an article can be annotated with are defined in the Corenews Ontology, a model that allows us to make simple annotations (content that mention a person) or more complex ones (content about events that took place at a particular location and time).

The additional annotations that describe the real-world entities that the content mentions or is about will be rolled out over the coming months. Currently you can see examples of these ‘about’ annotations on the results pages for each local council in the May 2013 local elections.

 Jeremy Tarling is senior data architect for BBC News Future Media.

Tagged with:


This entry is now closed for comments.

  • Comment number 4. Posted by jeremytarling

    on 5 Jul 2013 12:43

    Hey @lucas42 thanks for the comment. Sorry about the messed up redirects, i'll sort them out.

    Regarding your comment about re-inventing the wheel, I tend to agree with your point of view on this. My earlier version of this model made reference to the foaf, event and story ontologies: http://topdrawersausage.net/2013/01/07/bbc-news-data-model-v0-1/ but in the process of implementation on the BBC's linked data platform I removed references to the upper ontologies and use local specialisations. Dave Rogers is the tech lead on the platform, and he's blogged his thinking about this here: http://daverog.wordpress.com/2013/05/30/ontologies-in-software-a-conflict-of-interest/

    Regarding bbc.co.uk/ontologies/coreconcepts I believe that will be up soon, it's certainly not secret just a work in progress. The idea will be that bbc.co.uk/things/GUIDs will be dereferenceable and return data about what the BBC knows about that thing and sameAs statements to public equivalent resources.


  • Comment number 3. Posted by lucas42

    on 2 Jul 2013 15:23

    It's great to see the BBC pushing ahead with its use of linked data, and not resting on its laurels. Though I do have one question about this particular ontology:
    When creating corenews, why did you decide to reinvent the wheel for most of these classes? There are existing ontologies which have already defined Person (foaf), Place (geo) and Organisation (foaf). In fact, there's already an Event Ontology (http://motools.sourceforge.net/event/event.html#) which links out to relevant ontologies for the classes it uses. Granted, it isn't as comprehensive as corenews, and I know it's always tempting to start from a clean slate, but wouldn't the resulting data be more "linked" if you built on top of what's already out there?

    P.S. I'm getting 404s for some of the ontology pages: http://www.bbc.co.uk/ontologies/news/2013-05-01.rdf (which is where I get redirected if I request the ontology with an RDF Accept header) and http://www.bbc.co.uk/ontologies/coreconcepts/Thing (Is this going to be a secret ontology until http://www.bbc.co.uk/things/ is launched?)

    • This entry is now closed for comments. Number of positive ratings for comment 3: 0
    • This entry is now closed for comments. Number of negative ratings for comment 3: 0
  • Comment number 2. Posted by jeremytarling

    on 6 Jun 2013 11:50

    hi @Sermonous thanks for your comment.

    We do have a taxonomic structure of sorts in the form of the News website sections (business, politics, etc.), some of which are divided into subsections. In my experience there can be problems with these sorts of classification schemes, specifically:

    * they tend to second-guess the user's mental model, often incorrectly. Navigating by real world concepts (people, places, organisations, etc.) allows users to make serendipitous onward journeys of discovery without forcing them to choose a section

    * organising content by section tends to become unmanageable, typically resulting in confusing polyhierachies that users may get lost in. This is particularly true in open-world domains such as news, where a given story may be relevant to multiple categories.

    There's an article by Clay Shirky on this topic that is worth a read, particularly the section "There is no shelf" http://www.shirky.com/writings/ontology_overrated.html


  • Comment number 1. Posted by Sermonus

    on 5 Jun 2013 14:06

    Dear Sir,

    I find it very interesting to use linked data in order to help users to find what they are looking for. I work as a documentalist for a German publishing house. Our method is to use a kind of taxonomy to keep a good overview over the amount of themes and topics. We try to avoid overlapping or even contradictions. I think a well-organized taxonomy (maybe even on a basic of a ontology helping to create/extend it) is the best way to help the user (in my case rather editors) to get the relevant and complete information. Of course is it necessary that the user understands the taxonomy terms. The best way to enter a "system" of articles and meta-data is a normal text search with a following, appropriate presentation of the meta data terms which allows a good overview over the list of articles (hit-list). A further step could be to specify the search terms (relevance feedback, repeatedly) to get a better list of articles of your interest.

    • This entry is now closed for comments. Number of positive ratings for comment 1: 0
    • This entry is now closed for comments. Number of negative ratings for comment 1: 0

More Posts