Hi I work with BBC News FM as a data architect, looking at ways that we can use linked data to help users discover relevant News stories quickly and easily on the BBC website.

My colleague Matt Shearer has blogged about some of the prototyping that we have been doing and Oli Bartlett has explained the underlying technology platform that we will be using in the production environment.

In this post I want to talk about how we are starting to embed machine-readable metadata (called RDFa) in the HTML source code of pages on the BBC News website.

Core News Ontology v1

Please follow this link for a more detailed view of the above image.


This will help users to find news content about the stories they want to know about and ultimately help to open up references to the data contained in those stories.

The metadata is derived from information added by journalists at content creation time. Although this additional code does not directly impact the user's experience of the page, it makes a big difference to machines, helping them to ‘understand’ that (for example) this page

  • Is a news article
  • Has the headline Results for Anglesey
  • Was published at 10:30 on April 29 2013
  • Was published by the BBC


This set of statements can be derived automatically from the content management system that journalists use to create articles, but we can go further.

By journalists annotating articles with statements about the real-world entities that the article mentions or is about and using Uniform Resource Identifiers (URIs) to refer to those entities in additional RDFa statements, we can publish Linked Open Data with every page of the BBC News website.

Using the above example again we state:

  • The article is about &lta BBC unique identifier for Anglesey County Council&gt
  • &ltthis BBC unique identifier for Anglesey County Council&gt is the same as &ltdbpedia.org/resource/Isle_of_Anglesey_County_Council&gt

To make this metadata available we have used a vocabulary published by the International Press Telecommunications Council (IPTC), a news industry body that works on standards for improved news exchange.

The model is called rNews and provides a simple way of expressing these and other facts about news content in RDFa.

Additionally the rNews vocabulary has been mapped against the widely-used schema.org metadata vocabulary proposed by Google, bing and others.

 

Initially we have added the automatically generated RDFa statements to all BBC News pages with the intention of assisting search engines and social media sites in their display of BBC News content.

An immediate benefit of this work will be that search engines will be better able to display links to BBC News stories, helping our users to find relevant stories and to determine from a search engine's listings the relevance of the article to the story that they are searching for.

The types of things that an article can be annotated with are defined in the Corenews Ontology, a model that allows us to make simple annotations (content that mention a person) or more complex ones (content about events that took place at a particular location and time).

The additional annotations that describe the real-world entities that the content mentions or is about will be rolled out over the coming months. Currently you can see examples of these ‘about’ annotations on the results pages for each local council in the May 2013 local elections.

 Jeremy Tarling is senior data architect for BBC News Future Media.

Tagged with:

Loading...

More Posts

Previous