Posted by Stephen Jolly on , last updated
The AI in Production team at BBC R&D is looking at some of the ways that Artificial Intelligence (AI) and machine learning could transform the business of producing media. These are new forms of automation, and we want to know what the opportunities are for using them to significantly increase the range of programmes that broadcasters like the BBC could offer. Could we build a system which would allow us to cover the hundreds of stages at the Edinburgh Festival, for example, or broadcast every music festival in the UK?
We started our research with a project aimed squarely at broadening coverage in this way, and opening up access to events that it would be impractical or un-affordable to cover using conventional techniques. In our prototype system, which we have named “Ed”, a human operator sets up fixed, high resolution cameras pointing at the area in which the action will take place, and then the automated system takes over. It attempts to create pleasingly framed shots by cropping the raw camera views. It then switches between those “virtual cameras” to try and follow the action. In many ways, this project is a successor to our prior work on automated production: the basic concept of covering live events by cutting between virtual cameras was explored previously by our Primer and SOMA projects.
One of the things that working with AI technologies really highlights is that there are big differences between how even “intelligent” computer systems view the world and how people do. If we think about the “unscripted” genres of television, such as sport, comedy and talk shows, most people would have little difficulty in identifying what they want to see depicted on the screen – it’ll usually be the action around the ball in a game of football, for example, or the people who are talking in a televised conversation. AI systems have no idea what we humans are going to find interesting, and no easy way of finding out. We therefore decided to keep things simple: this first iteration of “Ed” looks for human faces, and then tries to show the viewer the face of whoever is talking at any given point in time. These relatively simple rules are a reasonably good match for any genre consisting of people sitting down and talking – in particular, comedy panel shows, which is therefore the genre we have been targeting.
Our first version of Ed is entirely driven by rules like these. We generated them by asking BBC editorial staff about how they carried out these tasks in real productions. To frame its shots, Ed rigidly applies the kinds of guideline that students get taught in film schools: the “rule of thirds”, “looking room”, and so forth. Selecting which shots to show and when to change shots is similarly rule-based. Ed tries to show close-ups when people are speaking, and wide shots when they aren’t. It tries not to use the same shot twice in quick succession. It changes shots every few seconds, and tries not to cut to or from a speaker shortly after they start speaking or shortly before they stop again.
Having created a working system, we needed to test it. We’re proponents of “user-centred” approaches, and we believe that ultimately, the only test of our system that matters is what real audience members think of it. We want to compare our system’s decision-making, and the quality of the ultimate viewing experience, to that of human programme-makers. We have a series of formal studies planned to evaluate and improve Ed, and we started with an evaluation of shot-framing.
To compare Ed’s shot-framing to some human professionals, we took four directors and camera operators and asked them to frame some shots for us, based on footage from a “panel show” of our own that we created as test material. We asked Ed to do the same thing. We then mixed all the shots up and put them into pairs. Each pair consisted of two framings of the same shot – either both framed by humans, or one by a human and one by Ed. We showed them to 24 members of the public, asking them which one they preferred. Sometimes we asked these participants to think aloud as they decided, and we interviewed them afterwards to try to get a better understanding of their preferences.
We’ve already learned a lot by analysing the results of this study. We plan to write it up in full as a conference or journal paper, but just looking through the things people said to us has helped us come up with a number of additional rules that would improve Ed’s ability to frame shots attractively. People disliked having objects and people framed half-in and half-out of the shot, for example, or having unnecessary empty space within the frame. We hope to be able to pull even more insights from the data when it is fully analysed, and we plan to run further studies to evaluate Ed’s ability to select and sequence shots.
What’s next? Well, we intend to improve Ed, both by implementing the findings of our studies, and by replacing some of our rules with machine learning approaches, using the BBC archives as a source of training data. In addition, there are many aspects of a production that Ed does not currently attempt to address: lighting and sound, for example. Most importantly, we need to think about other genres – in particular, productions that require creative decision-making that can’t be approximated by simple rules, or by today’s machine learning techniques, which think very much “inside the box” defined by their training data. Shows for which a simple narrative must be assembled by whittling down a large set of potential material, for example, or which start off with a vision for a story and need to work out how best to tell it, will need humans and AIs to work together, posing new challenges. We also want to explore a “bottom-up” approach, working with real-world productions to identify tedious and time-consuming aspects of their work that would be good candidates for less ambitious but more immediately useful forms of AI automation.
We’ll be talking about Ed at this year’s IBC conference. (The conference organisers have been kind enough to give us their “Best Paper” award for our work.) If you want to learn more about the Ed system and our initial study, you can now read the paper 'AI in Production: Video Analysis and Machine Learning for Expanded Live Events Coverage' which has now been published.
Machine Learning and Artificial Intelligence Training and Skills from the BBC Academy including:
This post is part of the Future Experience Technologies section