BBC News Linked Data Ontology

Monday 3 June 2013, 08:00

Jeremy Tarling Jeremy Tarling Senior Data Architect

Tagged with:

Hi I work with BBC News FM as a data architect, looking at ways that we can use linked data to help users discover relevant News stories quickly and easily on the BBC website.

My colleague Matt Shearer has blogged about some of the prototyping that we have been doing and Oli Bartlett has explained the underlying technology platform that we will be using in the production environment.

In this post I want to talk about how we are starting to embed machine-readable metadata (called RDFa) in the HTML source code of pages on the BBC News website.

Core news ontology v1 Core News Ontology v1

Please follow this link for a more detailed view of the above image.


This will help users to find news content about the stories they want to know about and ultimately help to open up references to the data contained in those stories.

The metadata is derived from information added by journalists at content creation time. Although this additional code does not directly impact the user's experience of the page, it makes a big difference to machines, helping them to ‘understand’ that (for example) this page

  • Is a news article
  • Has the headline Results for Anglesey
  • Was published at 10:30 on April 29 2013
  • Was published by the BBC


This set of statements can be derived automatically from the content management system that journalists use to create articles, but we can go further.

By journalists annotating articles with statements about the real-world entities that the article mentions or is about and using Uniform Resource Identifiers (URIs) to refer to those entities in additional RDFa statements, we can publish Linked Open Data with every page of the BBC News website.

Using the above example again we state:

  • The article is about <a BBC unique identifier for Anglesey County Council>
  • <this BBC unique identifier for Anglesey County Council> is the same as <dbpedia.org/resource/Isle_of_Anglesey_County_Council>

To make this metadata available we have used a vocabulary published by the International Press Telecommunications Council (IPTC), a news industry body that works on standards for improved news exchange.

The model is called rNews and provides a simple way of expressing these and other facts about news content in RDFa.

Additionally the rNews vocabulary has been mapped against the widely-used schema.org metadata vocabulary proposed by Google, bing and others.

Pope

Initially we have added the automatically generated RDFa statements to all BBC News pages with the intention of assisting search engines and social media sites in their display of BBC News content.

An immediate benefit of this work will be that search engines will be better able to display links to BBC News stories, helping our users to find relevant stories and to determine from a search engine's listings the relevance of the article to the story that they are searching for.

The types of things that an article can be annotated with are defined in the Corenews Ontology, a model that allows us to make simple annotations (content that mention a person) or more complex ones (content about events that took place at a particular location and time).

The additional annotations that describe the real-world entities that the content mentions or is about will be rolled out over the coming months. Currently you can see examples of these ‘about’ annotations on the results pages for each local council in the May 2013 local elections.

 Jeremy Tarling is senior data architect for BBC News Future Media.

Tagged with:

Comments

Jump to comments pagination
 
  • rate this
    0

    Comment number 1.

    Dear Sir,

    I find it very interesting to use linked data in order to help users to find what they are looking for. I work as a documentalist for a German publishing house. Our method is to use a kind of taxonomy to keep a good overview over the amount of themes and topics. We try to avoid overlapping or even contradictions. I think a well-organized taxonomy (maybe even on a basic of a ontology helping to create/extend it) is the best way to help the user (in my case rather editors) to get the relevant and complete information. Of course is it necessary that the user understands the taxonomy terms. The best way to enter a "system" of articles and meta-data is a normal text search with a following, appropriate presentation of the meta data terms which allows a good overview over the list of articles (hit-list). A further step could be to specify the search terms (relevance feedback, repeatedly) to get a better list of articles of your interest.

  • rate this
    0

    Comment number 2.

    hi @Sermonous thanks for your comment.

    We do have a taxonomic structure of sorts in the form of the News website sections (business, politics, etc.), some of which are divided into subsections. In my experience there can be problems with these sorts of classification schemes, specifically:

    * they tend to second-guess the user's mental model, often incorrectly. Navigating by real world concepts (people, places, organisations, etc.) allows users to make serendipitous onward journeys of discovery without forcing them to choose a section

    * organising content by section tends to become unmanageable, typically resulting in confusing polyhierachies that users may get lost in. This is particularly true in open-world domains such as news, where a given story may be relevant to multiple categories.

    There's an article by Clay Shirky on this topic that is worth a read, particularly the section "There is no shelf" http://www.shirky.com/writings/ontology_overrated.html

    JT

  • rate this
    0

    Comment number 3.

    It's great to see the BBC pushing ahead with its use of linked data, and not resting on its laurels. Though I do have one question about this particular ontology:
    When creating corenews, why did you decide to reinvent the wheel for most of these classes? There are existing ontologies which have already defined Person (foaf), Place (geo) and Organisation (foaf). In fact, there's already an Event Ontology (http://motools.sourceforge.net/event/event.html#) which links out to relevant ontologies for the classes it uses. Granted, it isn't as comprehensive as corenews, and I know it's always tempting to start from a clean slate, but wouldn't the resulting data be more "linked" if you built on top of what's already out there?

    P.S. I'm getting 404s for some of the ontology pages: http://www.bbc.co.uk/ontologies/news/2013-05-01.rdf (which is where I get redirected if I request the ontology with an RDF Accept header) and http://www.bbc.co.uk/ontologies/coreconcepts/Thing (Is this going to be a secret ontology until http://www.bbc.co.uk/things/ is launched?)

  • rate this
    0

    Comment number 4.

    Hey @lucas42 thanks for the comment. Sorry about the messed up redirects, i'll sort them out.

    Regarding your comment about re-inventing the wheel, I tend to agree with your point of view on this. My earlier version of this model made reference to the foaf, event and story ontologies: http://topdrawersausage.net/2013/01/07/bbc-news-data-model-v0-1/ but in the process of implementation on the BBC's linked data platform I removed references to the upper ontologies and use local specialisations. Dave Rogers is the tech lead on the platform, and he's blogged his thinking about this here: http://daverog.wordpress.com/2013/05/30/ontologies-in-software-a-conflict-of-interest/

    Regarding bbc.co.uk/ontologies/coreconcepts I believe that will be up soon, it's certainly not secret just a work in progress. The idea will be that bbc.co.uk/things/GUIDs will be dereferenceable and return data about what the BBC knows about that thing and sameAs statements to public equivalent resources.

    JT

 
 

This entry is now closed for comments

Share this page

More Posts

Previous
What's on BBC Red Button 2 - 8 June

Saturday 1 June 2013, 05:00

Next
Perceptive Radio: Object-based broadcasting

Tuesday 4 June 2013, 07:01

About this Blog

Staff from the BBC's online and technology teams talk about BBC Online, BBC iPlayer, BBC Red Button and the BBC's digital and mobile services. The blog is reactively moderated. Your host is Nick Reynolds.

Blog Updates

Stay updated with the latest posts from the blog.

Subscribe using:

What are feeds?

Links about BBC Online

BBC Internet blog Archive

owl-plain-112.jpg 2012 ι 2011 ι 2010 ι 2009 ι 2008 ι 2007

Tags for archived posts