IRFS Weeknotes #139
These are weekly notes from the Internet Research & Future Services team in BBC R&D where we share what we do. We work in the open, using technology and design to make new things on the internet. You can follow us on Twitter at @bbcirfs.
As Barbara mentioned in weeknotes last week, we recently held a workshop about what we wanted to do in FI-Content 2, which is the second part of a large EU-funded project we're involved in on the future of the internet. We were thinking quite generally at this stage, because the first part of the project itself is about choosing and designing exactly what we will work on.
A number of themes came out of that discussion, and Theo, Chris G and I are still working through them, talking with Tristan about how to prioritise them, and also meeting with Lancaster University (one our partners in FI2) to discuss how we can use their amazing testing facilities within the project.
But most of the work this week fits quite nicely under one theme in particular: making the best of the data we have.
The BBC produces a great deal of data, and in IRFS we're lucky enough to have access to a bunch of it in order to see what uses it can be put to. A great example of this is the use of subtitles and metadata in Snippets to help people in production very quickly find the part of a TV programme they were thinking of.
VistaTV is taking a different chunk of data - anonymised logs from iPlayer - and working with various bits of the BBC to see how it might be useful to them and to audiences. This work is much more exploratory than Snippets. We don't exactly know what it'll be useful for yet. So Dan's been working with Chris L to generalise how we store the data. Dan says:
"We think the live data about which on-line services are popular will be useful for future projects, so we're building an elasticsearch index to store the data as it arrives from the API. This will allow us to query the data and find interesting viewing trends. This also allows the web application to simply query elasticsearch for the data, rather than gathering its own statistics."
Chris Newell has been working on an earlier part of the data processing pipeline, exploring how we can use GeoIP lookup to see whether some programmes are more popular in different parts of the country and pushing that through to the elastic search part. Andrew N and Thomas have been working on the frontend, generalising the current code from Radio to TV and potentially other uses.
The World Service work and Mood Metadata frameworks fit under this category too. In the world service project (ABC-IP), 'machine listening' is used to discover information currently inaccessible to machines and make it available for searching and classification. This week Yves kicked off the last quarter of ABC-IP with Metabroadcast and has also been working on a small message-queue based framework for speaker identification. He released a new version of the ruby-lsh library on Github, much more memory and disk-efficient, and also presented on the World Service work at the British Library Labs event. Michael S has been meeting with digital humanities team from kings college about art interfaces for archives.
Our Mood Metadata work is about finding new information that might be implicit in the data we have. Jana has been generalising her Matlab / Octave feature extraction and machine learning framework used for Mood metadata classification to make it usable more generally for different projects, and Denise has been trying it out with pre-computed audio and video features from our time-line user trials.
Making best use of the data we have includes ensuring we have a strong, linked up, public-facing way for people to explore and understand what we do and why. In an organisation as large as the BBC, information that goes outside and back in again can be as important as information channeled within the organisation. Olivier has been continuing IRFS's work on the new BBC R&D website, working with the Matts (our own Matt P and Matt G from Kite) to fix the minor security issues found by the penetration testing, and more generally getting all systems and paperwork ready to launch the site. He says "I've also started moonlighting as a traveling salesman - going from team to team to demo the site and show how things work, and get each of our sections to think about how they will use this new/improved tool to promote their work". Chris Needham has been testing the system by writing a blog post on the TV authentication project: "In writing I've found some issues with the R&D website CMS."
And finally, one thing that doesn't fit with my theme of the week: Olivier, Chris L and Matt P are participating in the W3C Audio working group face to face meeting this week, where Olivier and Chris are chairs. This seems like a good point to remind you of our prototype: Recreating the sounds of the BBC Radiophonic Workshop using the Web Audio API, and to end with lots of links:
Six Degrees of Francis Bacon is a digital reconstruction of the early modern social network in England
What we do we mean when we talk about ‘TV’? Definitions!
Good overview and recommendations for crowd-sourcing from the National Library of Australia
Cubelets, snap them together to make a robot
Let us pay for this service so it won’t go down - the quote "You must own any data that’s irreplaceable to you" never felt so true.
A survey on open data - it will be interesting to see the results.
I (and others) have been looking with interest at a number of "quantified self" tracking sensors. This one - Amiigo - looks deliciously geeky.
and from me: