Digital Distribution: How on demand content reaches audiences
In his last blog post, Kiran Patel discussed the changes that have recently been introduced in the creation and publishing of on demand videos. Now Product Manager Federico Benedetto explains how on demand content is delivered from data centres directly to your homes across the public internet and describes the work done by the Media Distribution team to make this possible.
The edge: Content Distribution Networks
One of the biggest technical challenges that we have to solve as a Content Provider is how to deliver video and audio streams to a geographically spread out audience. At peak times, many hundreds of gigabits per second of data have to be transferred across the Internet. If done inefficiently, it could cause congestion in the network infrastructure and will offer a bad quality of service to our users.
For this reason, we use multiple third party Content Delivery Networks (CDNs) to distribute media content. A CDN will usually have agreements with the different Internet Service Providers across the UK and globally, so that they will always try to serve a piece of content from a server that is well connected to the end user’s location. They will do this ensuring that the network infrastructure between them is not overloaded and offering low latencies and fast buffering times.
We always try to present a choice between at least two different CDNs on each client platform. At any moment in time, the choice of each CDN is based on their capacity, performance and available features. In this way clients can switch between different CDNs in case their Internet connection to one of them is particularly congested, thus having a higher chance of a good streaming experience.
The CDNs act as a geographically distributed web cache: if a piece of content is new, it will have to be retrieved from the BBC first, then stored on the CDN servers and eventually delivered to the users. After a piece of content is initially fetched from the BBC origin, every new request will be directly served from the CDN servers, without contacting the BBC anymore.
This mechanism is fundamental as it offloads the majority of our traffic to CDNs, but unfortunately it also means that we lose direct visibility of our user requests and journeys. To counter this loss of information, we periodically collect access logs from the CDNs and analyse them to evaluate access patterns, quality of service and look for possible improvements. We also collect metrics directly from the clients, which send regular heartbeats and diagnostic information back to us while playing a media stream. A future blog post from the Media Analytics team will shed some light on their work in this area.
The origin: Radix
Most of iPlayer content is available for 30 days and needs to be available in multiple formats and different levels of definition. This means that at any point in time we need to have tens of terabytes of unique data available for our users to consume. Every CDN might also request the same media asset multiple times from our origin servers, depending on how many edge servers are in use (i.e. how geographically spread the end users, requesting the content, are): this means that despite being shielded by the CDNs, at peak times we will always have to distribute a large number of gigabits per second from our origin servers to make our content available in a timely manner.
For this reason we have built Radix, the root of all the BBC’s on demand content.
Radix originates all BBC on-demand content which is distributed by the CDNs
Each Radix server can store more than 8 terabytes of unique data, constantly deleting old content and fetching new items from its canonical origins. In terms of throughput, each server can provide up to 10 Gbps of peak capacity and handle hundreds of requests per second, with thousands of concurrent connections.
Radix offers a unified entry point for every CDN, as well as providing authentication (only authorised CDNs can fetch content from Radix) and content routing. Depending on which service is accessed, Radix will choose a different canonical origin – or backend – to fetch the original content: for example iPlayer download assets are stored as static files on the Cloud, while HDS streams for iPlayer are generated on demand by a cluster of servers. This flexibility lets us modify our canonical origins, while always presenting the same interface to the multiple CDNs in front of it.
Similarly to a CDN, Radix itself is a web cache. The same piece of content is requested multiple times by each CDN and adding a further layer of caching in front of the canonical origins considerably reduces the amount of data that is transferred out of them. We pay for every gigabyte we move between Radix and our canonical origins, so anything we can do to reduce this transfer results in direct savings.
The Radix platform as a whole consists of several pools of servers, distributed across 2 physical locations, so that we always have enough capacity to provide a highly available service even in those rare events when a whole data centre becomes unreachable: we can’t simply turn iPlayer off just because we need to do some maintenance in one of our data centres.
The Radix software stack is a mix of open source software and custom built extensions and applications. In order to select the right components and tune the system to achieve our performance targets we have been running a large number of load-tests, to make sure that the service is able to cope with realistic traffic peaks.
Despite using well-known and highly regarded open source components, such as Apache httpd, Varnish and Nginx, during such load-tests we have uncovered and worked around a few issues which are problematic only at the very high throughput levels at which we have been testing them. We will cover our tests and technical choices in detail in a series of future blog posts.
The migration of all origin traffic from our previous platform to Radix has been a large piece of work during the past few months, which has involved the coordinated effort of several teams within BBC Digital. Guaranteeing an uninterrupted service, while transparently switching major systems under the hood has been a big challenge for us.
We consider it a success when a change such as this one is entirely transparent to the audience: our goal is in fact to provide a stable and highly available platform for iPlayer and the other BBC products online, so that the public conversation can be entirely focused on the content that we distribute, rather than on the technologies that make it possible.
We are hiring!
Radix is just one of the projects that we are working on in the online Media Distribution team at the BBC. If you like the idea of pushing HTTP performance to the extremes and want to play an important part in the future of online video distribution, you can apply here.