Posted by Tristan Ferne on , last updated
These are the weeknotes from Internet Research and Future Services, a team in BBC R&D. This week: the basics of machine learning, TV watching and some hacks
Something Something - Fuzzy genres for choosing TV programmes
At our team meeting this week Jana gave an introduction to machine learning. What is it, what should you use it for and where do you start? We have Jana and a few other machine learning specialists in the team but we felt it would be good for us all to have a basic understanding of where it can be applied and how. Our team's work encompasses face recognition, object recognition, scene classification, entity extraction, sentiment detection, recommenders, speech-to-text, speaker identification, music/speech discrimination & music genre/mood analysis. Phew. Jana pointed out that as we generally do applied research, one of our biggest challenges is getting enough real-world, accurate training data, ie. sets of things where we know what the correct answer/classification is.
At last week's meeting Henry, Tim and Libby showed us some interaction sketches and prototypes that explore different ways of choosing what to watch with friends and family. From Cards Against TV to micro-genres to Tindervision to fuzzy lists (see photo above). More soon.
The Atomised News team have been powering through the final MVP features. This week we got infosec signed-off, caching & sharing added and reached full test coverage. Almost there.
Chris has been at a meeting of the W3C Second Screen Working Group, who are developing the Presentation API and Remote Playback API specifications for displaying web content on second screens. Henry kicked off some work around understanding spoken interfaces; devices and services like Alexa, Google Home and Siri, but also prototyping our own things. We had a good brainstorm to generate ideas and discuss the challenges we see in making this kind of software, and we’ve been talking to others around the BBC who have an interest in the area. I made Henry write a New Project Proposal™.
The Editorial Algorithms team has been very busy working with, or talking to colleagues across the BBC: Katie organised a small trial using our system to help cover the Hay literary festival, Chris helped a team in Marketing and Audiences analyse some complex survey results with sentiment analysis tools, Olivier had meetings and demos in London and Salford, and Kate did some user research with journalists in BBC Monitoring and tested some UI changes that Chrissy built.
Meanwhile, Manish and Michel got a little closer to peace of mind, solving issues with pipelines and cloud infrastructures. They were joined by Thomas in testing Artifactory as a replacement for our Docker registry. Frankie made progress on a system to manage feed aggregation and analysis, while David and Fionntán, among other things, started thinking and playing with clustering algorithms as a way to present streams of content. Finally, Georgios put some finishing touches on an algorithm to detect the “main protagonist” in an article, comparing its results with the relevance score in our in-house semantic tagging tool. Oh, and we published the second blog post on editorial algorithms, this time about web metadata and its discontent.
Ben and Jana have been putting together a report to take back to the a BBC archive team to show them how CODAM can be used to make sense of collections of videos and clips. This has thrown up some challenges in analyzing and presenting videos in many different formats found in the wild. They have also been looking at further ways to optimize the search algorithm.
Matt has been looking at GloVe, an algorithm which allows you to obtain vector representations for words, and has been trying to implement something in Go. This is his first foray into Go and so far, he's enjoying it. Tom P has finished his placement with us and has been summarizing the results of his speech-music discrimination algorithm, and investigating how it can improve the word error rates in Kaldi. And conversely, Qiong has just started her internship on speech synthesis. She's started looking at the possible sources of data to build some models.
Elsewhere. We sent a team to the Radio & Music team's hackday, along with our new secret system which was used by most of the hacks; Olivier wants to buy some standing desks - though to me they look like cardboard boxes to put on your sitting desk; there was a lunchtime game of Avalon or Dominion, I'm not sure which; our wifi kept kicking us off; Katie shared some room booking hacks; some people wrote slack bots; and we wondered whether we're a scalable centralized innovation unit or do applied futuring.
Things we've been reading recently:
Georgios shared this visualisation of music taste over time
Mozilla FlyWeb is a web technology for electronic devices to find each other and communicate. Jonas Sicking presented this at the W3C Second Screen WG meeting (via Chris N)
The UX of VR (via Thomas)
How technology is designed to hijack your mind and feed your addictions
A fantastic film of a dystopian augmented reality
And super-powered auto-complete for writing sci-fi