Scaling responsible machine learning at the BBC
Head of Data Science and Architecture, BBC D&E
Machine learning is a set of techniques where computers can ‘solve’ problems without being explicitly programmed with all the steps to solve the problem, within the parameters set and controlled by data scientists working in partnership with editorial colleagues.
The BBC currently uses machine learning in a range of ways – for example to provide users with personalised content recommendations, to help it understand what is in its vast archive, and to help transcribe the many hours of content we produce. And in the future, we expect that machine learning will become an ever more important tool to help the BBC create great audience experiences.
The BBC was founded in 1922 in order to inform, educate and entertain the public. And we take that purpose very seriously. We are governed by our Royal Charter and public service is at the heart of everything we do. This means that we act on behalf of our audience by giving them agency and that our organisation exists in order to serve individuals and society as a whole rather than a small set of stakeholders.
With Machine Learning becoming a more prevalent aspect of everyday life, our commitment to audience agency is reflected in this area as well. And so in 2017, we submitted a written commitment to the House of Lords Select Committee on Artificial Intelligence in which we promised to be leading the way in terms of responsible use of all AI technologies, including machine learning.
But what does this mean in practice?
For the last couple of months, we have been bringing together colleagues from editorial, operational privacy, policy, research and development, legal and data science teams in order to discuss what guidance and governance is necessary to ensure our machine learning work is in line with that commitment.
Together, we agreed that the BBC’s machine learning engines will support public service outcomes (i.e. to inform, educate and entertain) and empower our audiences.
This statement then led to a set of BBC Machine Learning Principles:
The BBC’s Values
1. The BBC’s ML engines will reflect the values of our organisation; upholding trust, putting audiences at the heart of everything we do, celebrating diversity, delivering quality and value for money and boosting creativity.
2. Our audiences create the data which fuels some of the BBC’s ML engines, alongside BBC data. We hold audience-created data on their behalf, and use it to improve their experiences with the BBC.
3. Audiences have a right to know what we are doing with their data. We will explain, in plain English, what data we collect and how this is being used, for example in personalisation and recommendations.
Responsible Development of Technology
4. The BBC takes full responsibility for the functioning of our ML engines (in house and third party). Through regular documentation, monitoring and review, we will ensure that data is handled securely. And that our algorithms serve our audiences equally and fairly, so that the full breadth of the BBC is available to everyone.
5. Where ML engines surface content, outcomes are compliant with the BBC’s editorial values (and where relevant as set out in our editorial guidelines). We will also seek to broaden, rather than narrow, our audience’s horizons.
6. ML is an evolving set of technologies, where the BBC continues to innovate and experiment. Algorithms form only part of the content discovery process for our audiences, and sit alongside (human) editorial curation.
These principles are supported by a checklist that gives practitioners concrete questions to ask themselves throughout a machine learning project. These questions are not formulated as a governance framework that needs to be ticked off, but instead aim to help teams building machine learning engines to really think about the consequences of their work. Teams can reflect on the purpose of their algorithms; the sources of their data; our editorial values; how they trained and tested the model; how the models will be monitored throughout their lifecycle and their approaches to security and privacy and other legals questions.
While we expect our six principles to remain pretty consistent, the checklist will have to evolve as the BBC develops its machine learning capabilities over time.
The Datalab team is currently testing this approach as they build the BBC’s first in-house recommender systems, which will offer a more personalised experience for BBC Sport and BBC Sounds. We also hope to improve the recommendations for other products and content areas in the future. We know that this framework will only be impactful if it is easy to use and can fit into the workflows of the teams building machine learning products.
The BBC believes there are huge benefits to being transparent about how we’re using Machine Learning technologies. We want to communicate to our audiences how we’re using their data and why. We want to demystify machine learning. And we want to lead the way on a responsible approach. These factors are not only essential in building quality ML systems, but also in retaining the trust of our audiences.
This is only the beginning. As a public service, we are ultimately accountable to the public and so are keen to hear what you think of the above.