Posted by Ben Clark on , last updated
Since I'm writing weeknotes, I'll explain a bit more about the work we're doing on face recognition. Face detection and recognition have made huge improvements in performance in the last few years, to the point that some people claim that computers are as good at recognising people as humans. With the arrival of cloud-based services which offer face detection and recognition on-demand with an API call, we wanted to see whether this technology could help make better and faster ways of making TV programmes.
Television programmes, particularly news and factual, often use footage of people as part of a "package" of audio and video. Finding the right clip takes time, and we think that face recognition could really help to reduce the amount of time researchers and journalists spend searching for footage of a particular person.
We've been testing commercial APIs and open-source software against a "ground-truth" which we made by annotating footage from BBC programmes. Every second, a human being drew boxes around the faces of people on screen, and labeled the known people in the frame. By running the same footage through different implementations, we can compare how many faces are found by each API.
There are a lot of different ways to compare face recognition systems. We're primarily interested in three: how many faces are found, how many are recognised correctly, and how many are recognised as someone else? This is because we want to search a large archive of videos and return a list of clips for a given person. Interestingly, we've found quite a bit of difference between different APIs currently available.
Some are very conservative, identifying a small number of faces with high accuracy, but not attempting to identify other faces. Other APIs get lots of faces correct, but also misidentify a large number of faces as well.
We didn't see any correlation between accuracy and price. Given that we would want to process many thousands of frames of video, the cost of using many of the APIs is a significant factor for a large video archive. Overall, I think we've learned how important it is to compare machine learning APIs, as there are big differences between what is on offer.
In the rest of the Data Team, Denise has been working on the launch of the Data Science Research Partnership which took place in Broadcasting House last week.
Ian Forrester tries a mind-controlled computer interface at the Data Science Research Partnership Launch
Matt has been working on a new release of our speech-to-text system: fixing bugs and making slight improvements in its performance.
Chris has continued to work on Citron, our Quote Extraction and Attribution system. All that remains is to implement coreference resolution. For this Chris has been exploring alternatives to neuralcoref which he found to be too slow. David has focused on the UX of quote attribution this sprint. He has talked to journalists to understand the sourcing and usage of quotes and this has helped form some early use cases.
Tim’s been to Salford to talk to more folks around the BBC about how they might be able to make use of our editorial algorithms work, and to catch up with our FXT colleagues about their ‘Human-Data Interaction’ work and how it can inform our work on the Public Service Internet. He’s also been digging into the last release of RAJAR radio audience data, trying to pull out usage data to inform our personalised radio work, and spent some time interviewing the scheduler of Radio 2, discovering how music is chosen for our radio services.
Joanne, Libby and Kate have started the research work for Better Radio Experiences. They're hoping to run a diary study, interviews and a workshop with 18-21 year olds interested in audio - in Bristol in November and December.
This was the first “prototyping” sprint in the New News project. The intention was to throw away what was built in the design sprint, but learn from it. The prototype should combine media and is aimed at people under 25. The team developed 4 ideas and prototypes:
- FastForward – an interaction prototype that uses the scrolling of text captions to move around a video.
- DrawingIn – use atmospheric news footage to draw someone in to a story, leading them to interviews and, finally, facts. Designed as a contrast to the usual fact-first approach
- Eyewitness – using mobile phone eyewitness footage and information about the eyewitnesses to make something authentic.
- Atmosphere – using scene-setting immersive audio to enhance a story.
They produced one HTML prototype, one storyboard and two Keynote prototypes and tested these on 4 audience members over two days. Following this they reviewed each prototype against the brief and evaluation criteria.
On the Tellybox project, Alicia has been adding controls (volume, pause, go back) for the video player for 5 - 2 - 1 and is currently making the prototype work properly on the Pi. Libby had a big fight with pulseaudio and bluetooth and lost, so the voice part of Something Something is still a work in progress.
For Talking With Machines Andrew and Henry have worked closely with our illustrator Rob Turpin - we now have the final artwork for the Inspection Chamber. Andrew has created promotional material using the artwork and graphics.
For our Standards strand of work, Chris is continuing to get ready for the W3C's TPAC meetings, organising the agenda for the Media and Entertainment Interest Group, and preparing presentations and discussion topics on HbbTV and next steps for the TV control work.
AI Scheduling Hackday
Rob, Jana, Libby, Tim, Matt and George travelled to Salford to take part in a second hackday around AI-driven scheduling of programmes with our colleagues in the North Lab. They took the best ideas from the first hackday and worked them up further over the course of the day. Jasmine, Libby, Mike looked into robotising a human on top of video. Tim & Rhia and Jana, Matt and Tom both looked at automatically choosing series of "related" clips using AI.
This post is part of the Internet Research and Future Services section