Main content

BBC News Labs is part of BBC Connected Studio’s pan-BBC innovation and collaboration programme.

I'd like to introduce you to a new project called #newsVANE.


The BBC is approaching a new challenge: In coming months and years, we will have a fast-growing abundance of content items tagged and interconnected by semantic web technology.

To give a sense of the scale involved, the BBC is broadcasting hundreds of hours of News every day, globally, in thirty languages, in TV, Radio, and web page formats. For every hour of material there are hundreds of topics mentioned, all of which can be connected with semantic technology.

Add to this the many thousands of hours of BBC News Archive we plan to publish.

While this is an amazing situation to be in, and while it will help the BBC tell powerful and interconnected stories even more effectively, it gives rise to a new challenge.

The Challenge

For any given linked data topic or Storyline, there is going to be a profusion of associated pieces of content, from the BBC and elsewhere. So how do we select the best, the strongest, the most appealing, emotive, and relevant content from this array of connected items? How do we avoid a confusing myriad of options, and steer our users toward the best experience?

This presentation can be found at the BBC News Labs Slideshare account

Tools: the Data

We have at our disposal a plethora of data sources from BBC R&D, internal BBC News systems, other private BBC systems, publically open data providers (Gov, ONS et al), and internet feeds.

These data sources have various properties that might be used to help us signal, select and prioritise, and perhaps even "leap across the connections” to the next, most resonant and contextually relevant content items and news “Storylines”.

One especially interesting data source is The Juicer, in which we have over 650,000 News articles tagged with 150,000+ semantically connected topics. This is a platform we have previously been using to support prototyping, and we now want to use this repository of Journalism as a data source in itself. Among other things, The Juicer allows us to measure the frequency of topic occurrence within various slices of the News, and also the patterns in co-occurrences of topics with one another.

The Approach: #newsVANE

#newsVANE is the umbrella project for a set of prototypes to explore the possibilities during 2014. The question we are asking in the #newsVANE project is:

How might we use combinations of our data sources to generate scalable relevance tools, so that we can promote the best connections across millions of content items & topics?

Dashboard & Infographic Prototypes

We are prototyping with combinations of these data sources to produce dashboards and infographics as a way to test emergent ideas with Journalists and and with ad-hoc user testing.

Some early ideas include the compilation in a dashboard of various data sources: feeds, timeline of the emergence and development of news, trending topics and stories on social media and in our content… The kind of tool that journalists could be keen on firing first thing in the morning.

We also focus our attention on the exploration of semantic referencing of news, in the hope that some patterns or correlations will emerge.

If you are interested in finding out more, you can follow us and receive updates via these platforms:-

News Labs on Twitter

News Labs on Google Plus

News Labs on Slideshare

News Labs on YouTube

Matt Shearer is Innovation Manager, BBC News Labs