BBC News Labs is part of BBC Connected Studio’s pan-BBC innovation and collaboration programme.
I'd like to introduce you to a new project called #newsVANE.
Background
The BBC is approaching a new challenge: In coming months and years, we will have a fast-growing abundance of content items tagged and interconnected by semantic web technology.
To give a sense of the scale involved, the BBC is broadcasting hundreds of hours of News every day, globally, in thirty languages, in TV, Radio, and web page formats. For every hour of material there are hundreds of topics mentioned, all of which can be connected with semantic technology.
Add to this the many thousands of hours of BBC News Archive we plan to publish.
While this is an amazing situation to be in, and while it will help the BBC tell powerful and interconnected stories even more effectively, it gives rise to a new challenge.
The Challenge
For any given linked data topic or Storyline, there is going to be a profusion of associated pieces of content, from the BBC and elsewhere. So how do we select the best, the strongest, the most appealing, emotive, and relevant content from this array of connected items? How do we avoid a confusing myriad of options, and steer our users toward the best experience?

This presentation can be found at the BBC News Labs Slideshare account
Tools: the Data
We have at our disposal a plethora of data sources from BBC R&D, internal BBC News systems, other private BBC systems, publically open data providers (Gov, ONS et al), and internet feeds.
These data sources have various properties that might be used to help us signal, select and prioritise, and perhaps even "leap across the connections” to the next, most resonant and contextually relevant content items and news “Storylines”.
One especially interesting data source is The Juicer, in which we have over 650,000 News articles tagged with 150,000+ semantically connected topics. This is a platform we have previously been using to support prototyping, and we now want to use this repository of Journalism as a data source in itself. Among other things, The Juicer allows us to measure the frequency of topic occurrence within various slices of the News, and also the patterns in co-occurrences of topics with one another.
The Approach: #newsVANE
#newsVANE is the umbrella project for a set of prototypes to explore the possibilities during 2014. The question we are asking in the #newsVANE project is:
How might we use combinations of our data sources to generate scalable relevance tools, so that we can promote the best connections across millions of content items & topics?
Dashboard & Infographic Prototypes
We are prototyping with combinations of these data sources to produce dashboards and infographics as a way to test emergent ideas with Journalists and and with ad-hoc user testing.
Some early ideas include the compilation in a dashboard of various data sources: feeds, timeline of the emergence and development of news, trending topics and stories on social media and in our content… The kind of tool that journalists could be keen on firing first thing in the morning.
We also focus our attention on the exploration of semantic referencing of news, in the hope that some patterns or correlations will emerge.
If you are interested in finding out more, you can follow us and receive updates via these platforms:-
Matt Shearer is Innovation Manager, BBC News Labs
Tagged with:

Comments
Comment number 7. Posted by JohnP
on 27 Feb 2014 13:20I think there are 2 issues here:
Firstly how many clicks does it take for an item to get on the list, and are they being deliberately manipulated (Why 2 items on Argentina?? Does someone have an agenda here??)
Secondly :Once on the list they stay there because there is no way to know that it's old news WITHOUT CLICKING ON IT. Surely the best way to improve that is by putting the date by the headline - then old news will quickly drop out again.
Loading…Comment number 6. Posted by Uu
on 27 Feb 2014 01:12@5 DBOne
I quite agree, clicks do push items up the list, but what I don't understand is that with all the current news items (hundreds of news items per day collected and collated by the BBC) why is the number one item about cities getting broadband cash from September 2012 and number seven Argentina senate condemns Falklands statement by the PM from January of the same year. Item eight has the Home Secretary under fire over deportation cat claim, October 2011, next is an item from Stephanie Flanders (former economics editor) about the Office for Budget Responsibility from December 2012.
Oh there seems to be a new number one it seems that the PM has announced a marriage tax break worth £200 from 2015 in an article dated September 2013.
What I trying to point out DBOne is that while I understand that items that are selected will rise to the top of the heap - how is it that 30 to 40% of the top ten Most Popular Read items are from yesteryear, and if this is what the BBC are displaying online today what hopes are there for a future 'tagged and interconnected technology' like #newsVANE.
I will leave you with today's latest No. 1 - Autumn Statement 5 December 2012 - all on the top ten list within the last hour.
Loading…Comment number 5. Posted by DBOne
on 26 Feb 2014 12:26@4 Why can't they be the most popular even if they are old - a simple link from a more recent popular story will mean more people will visit that page for example. How - more people have clicked for the why you would need to look at the source of the click.
Loading…Comment number 4. Posted by Uu
on 26 Feb 2014 11:47BBC News
http://www.bbc.co.uk/news/
Top ten Most Popular Read items
7 items are dated 26-2-2014 (today)
1 is from yesterday (still relevant)
Then at No. 10 - Why the Mitchell story matters - Phil from Eastenders? - no it's Andrew Mitchell's 'plebgate' story dated 19 December 2012.
And at No. 6 - The demise of free parking in city centres - an article about evening and weekend parking charges dated 30 July 2011.
Why/How are these items most popular today?
You sure have a challenge if this is an example of your data baseline.
Loading…Comment number 3. Posted by Uu
on 25 Feb 2014 01:46This all sounds too good to be true - all this linking and tagging - a good place to start trials is the BBC News Most Popular Now page
http://news.bbc.co.uk/1/shared/bsp/hi/live_stats/html/map.stm
This page provides the top ten BBC news items as viewed around the world.
February 24 2014 at about 23:00 hrs the top UK item is that Harold Ramis, of Ghostbusters fame, has died. Selecting this item from the list opens a small window containing a covering sentence and a link to the
Full Story (you will leave this page).
Select the Full story link
http://news.bbc.co.uk/1/hi/26327020.stm?lsm
and you get a 404 Error - Page not found.
(The actual link is http://www.bbc.co.uk/news/26327020)
Now by coincidence I tried the North American area on the most popular page and found that a Ghostbusters game is being developed and the original cast (including Harold) have signed up to lend their voices and faces to the game. It seems that Harold will also write the story for the game!!!!
Then I noticed the date - 15 November 2007.
I do hope that semantic technology will work better in the future at 'powerful and interconnected’ stories at the BBC.
Just had a quick look at http://now.pilots.bbcconnectedstudio.co.uk/ - a vision of connected studios homepage (the clock is nice) pity about the date as it's now Tuesday 25th. I think the date catches up about breakfast time. The last blogged update was May last year, and the page copyright is dated 2013 as well, so there may still be time for a bit of innovation: -
You know I used to use a brilliant homepage a few years ago - you could customise it - at a glance you could be up to date with your interests, and it provided links for further information - it was very useful - ah the good old days.
Loading…Comment number 2. Posted by Jon
on 24 Feb 2014 17:09It's great to see these kinds of tools being built and documented - I hope they act as further showcasing of what can be done on a semantic web framework do 'in real life', and look forward to reading about everything later in the year! Andrew - assuming your comment is a serious one, this sort of thing lends itself nicely to a number of applications beyond just 'finding the best stories' - accessibility, geographic relevance, and differing audience demographics spring to mind...
Loading…Comment number 1. Posted by Andrew Oakley
on 24 Feb 2014 14:40Isn't the correct solution to this problem "make the BBC smaller", and not "create a computer system to try to find an audience for the huge amounts of BBC output that nobody reads, watches or listens to?"
Loading…You must sign in to rate comments
Close