Datalab representing ML in the BBC - the experiment
Business Analyst, BBC+
One of the key objectives for the BBC for the coming years is to focus on younger audiences. Machine Learning recommendation capabilities can help us achieve that.
In a recent speech to BBC staff Tony Hall described our shared ambition “to grow our weekly online reach with younger audiences from 55% to 90% within four years”. We expect that proving personalised content via the Datalab platform to the new BBC+ app, including the learning throughout the experiment, will begin to contribute to this growth. Product Manager James Metcalfe describes the approach and objectives of BBC+ in his blog post.
Datalab is the first BBC platform working with Google Cloud Platform (GCP) in production. Being the pioneer of this integration is meant to refine our approach to information security and the data privacy approval process, as well as establishing a new infrastructure.
As if this wasn’t challenging enough, we brought more excitement to our engineering team by including gRPC integration, Elasticsearch, Kubernetes and Spinnaker for container-based deployments and Drone as part of our stack. As a team the decision was made to implement these technologies, regardless of the fact that not everyone was experienced with them. As a result, we have adopted some more than others, but again, one of the key points is that we learn along the way, and gain skills that will be valuable later in the programme.
As a platform, we had to connect to the existing BBC data stores, including our User Activity Store (UAS) database, as well as serving the media content from a different AWS database.
This infrastructure allowed us to provide the groundwork for data scientists to start to explore various methods to satisfy the needs of BBC products and master the personalised experience for BBC users, starting with BBC+ app.
For example, new users, who have no previous history with the BBC, are given a cold start recommendation, so that they can begin their BBC+ journey. The first focus for the Datalab team was to create these recommendations to serve relevant content. It progressively incorporates a user’s history, refines their content and helps them discover interesting BBC suggestions.
Finding out what data we had to work with was crucial. We quickly discovered that the metadata for some BBC content is inconsistent. This lead us to conversations with our editorial colleagues on ways of tagging and creating content metadatain a more uniform manner, so that we can surface their output to the audience in a more personalised way.
The first content type to be ingested is video clips. In the future the aim is to include audio and articles. Currently in our Elasticsearch DB we have 1,137,598 clips. A set of filters was applied to provide only relevant clips, with complete metadata and editorial risks mitigated:
- Unique clips
- Only English content (for now)
- Filter out audio and weather (for now)
- Filter out clips older than 2013 (for now)
- Editorial risk filtering
- 128 BBC brands are not surfaced in the BBC+ app
- 8 master brand were filtered out (mainly to help serve only English content for now)
That leaves us to 131,626 clips available for BBC+ users.
The following techniques are being used by our data science team, to experiment, score and create “better” recommendations:
- Model-based collaborative filtering: we’re using embeddings for our content using word2vecmodels, making the content -“words” and our playlists -“sentences”
- Offline scoring: to measure how a particular recommender system is performing, we’re using different metrics, like recency, popularity, Normalized Discounted Cumulative Gain (nDCG) and hit-rate. This helps us to select which versions of our models should go for online scoring
- Online scoring: we have a system in place to score the recommenders with the live user data, using A/Bor hit-ratetesting, whenever necessary
- Combining ML with editorial guidelines: we can prioritise one genre or brand against the others, to fit the editorial needs and better match our audience expectation
- To help share what we know we created a course, to establish a good understanding of what we are building
We came out of this iterative process full of valuable insights, and building a new ML platform was just one of the things we discovered. Datalab established a positive team culture, with a great deal of multi-disciplinary learning. Despite the bumpy road, we now have a clear vision of our engineering and data science responsibilities, including A/B testing our process, trying various agile methodologies, and bringing teams in Salford and London together for closer collaboration.
We are excited about the next steps, and there is a lot to do on all aspects of our platform: infrastructure, engineering, devops, data science and ML. We will be inviting a new BBC team into the Datalab world: beginning with an exploratory session with the Voice Team over Christmas. We will also be proposing collaborations with other product groups, including R&D and News, early in 2019 to continue to experiment, push the boundaries of our exploration further and innovate ML in the BBC.
If you would like to know more about Datalab, our journey, or you would like to share your own experience, feel free to contact us at firstname.lastname@example.org.
And in other important news: we are still recruiting! https://findouthow.datalab.rocks/
BBC+ is available on Android: https://play.google.com/store/apps/details?id=uk.co.bbc.bbc_plus