In the most recent phase of our Cloud-Fit Production Architecture work, we've been investigating how to bridge between live production environments (where streaming technologies are commonplace) and services running in the cloud (which use APIs and distributed storage techniques instead).
An important aspect of our R&D Cloud-Fit Production Architecture project is understanding how to match application requirements with the characteristics of different cloud service providers.
As part of this work, we have recently been investigating how we can use BBC R&D's on-premise OpenStack cloud to support real-time, uncompressed video ingest. Using an on-premise cloud provides us with high bandwidth edge connectivity: it's possible to get large quantities of video data into the cloud in realtime at zero cost, something that is hard to achieve over general-purpose internet links to public-cloud providers.
We used R&D's IP Studio technology to capture video and implement the ingest process. To receive video in the cloud, we modified our Squirrel Media Store service to use object storage on our OpenStack, whilst continuing to run the Media Store's API and Flow database on Amazon Web Services (AWS): an approach often termed 'Hybrid Cloud'. Object storage access is the most bandwidth intensive aspect of Squirrel, so it made sense to move that on-premise first.
In this post we'd like to share some details about what we've built, some of challenges we've encountered, and some information about what we plan to do next.
Getting Video into our Cloud...
An overview of the system we've created is shown below. The upper block shows BBC R&D's research network, which allows us to send video to an IP Studio Node via an SDI link or SMPTE ST 2110 RTP. The IP Studio Node uploads video frames as objects to our OpenStack cloud, via an S3-compatible API (provided by a component called RADOS Gateway) which is backed by an object storage technology called Ceph.
Flows and their associated objects are also registered into a database (bottom left of the diagram), so that we can index our media easily via our Squirrel Media Store API. You can read more information about how our Squirrel works in one of our previous blog posts.
We tested this setup using uncompressed High Definition (1920x1080) video stored in a typical professional format (10-bit with v210 packing) at 25 Hz and 50 Hz. This corresponds to a base data rate of around 2.2 Gbit/s/camera for 50 Hz video!
Turning Video Frames into Objects
So how do we actually turn video streams into objects? Our ingest Node runs a series of real-time software components that capture and process video frames. The last component in the chain is the 'SquirrelStore' which we built as part of this work.
The SquirrelStore component receives video frames in realtime (wrapped up with timing information in an NMOS-style Grain) and assembles them into objects ready to upload to our cloud-hosted Media Object Store. Typically we write one frame into each object, which helps to keep the latency of the process low.
What are Flows, Sources and Grains?
You'll often see these terms popping up when we talk about our IP media work. They all come from the Networked Media Open Specification (NMOS) model, which is being adopted by a wide range of broadcast manufacturers.
Flows: Sequences of video, audio or data that could be sent down a wire, such as the pictures from your computer to your screen.
Sources: A group of Flows that are, in principle, the same content. For example, all the different quality settings of an episode of Doctor Who on iPlayer.
Grains: Single elements within a flow, such as video frames.
Inside SquirrelStore, a pool of worker threads are fed objects to upload via a queue. A number of worker threads run in parallel, enabling us to adjust the number of uploads we are running according to the required data rate.
We also use a zero-copy approach to move video data in SquirrelStore, which helps to reduce CPU load, enabling support for higher data rates on less powerful machines.
Finally, we found the AWS SDK for C++ useful to interface with the S3 API on our OpenStack cloud.
Data Rate and Latency: Parallelism is Key!
Our cloud-based object store is designed to support horizontal scaling, which allows higher data rates (or more users) to be supported by adding more network and storage capacity.
However, one of the characteristics of a horizontal scaling approach is that clients must tolerate higher and more variable latency than they would experience when using dedicated high performance (vertically scaled) storage. Latency is particularly relevant for live or near-live productions where processes need fast access to the media as it arrives in the store.
We've written previously about how parallelisation can help us achieve high data rates, and we needed to pull the same trick again here to make the most of this storage architecture.
The diagram below illustrates how the parallel upload threads in the SquirrelStore component are used to achieve the data rates needed to support 50 Hz video. Video frames are captured every 20 ms and in this example the upload of a single video frame contained in an object takes around 43 ms. This on its own is clearly insufficient to support the real-time video, however by running three writer threads in parallel we can achieve the necessary data rate, but at the cost of 43 ms of latency.
Once everything was built and working, we ran our ingest system whilst monitoring a range of parameters in order to understand how well it was performing.
The main conclusion from our testing was that the system was able to support uncompressed 50 Hz HD video, at reasonable latencies (around 100ms), without dropping frames.
An interesting discovery we made early on was the need to add a credentials caching strategy to Rados Gateway to reduce the number of calls to OpenStack's Keystone authentication service. The result was a 35% reduction in upload times in the tests! We're contributing our changes back to the upstream open sourced code.
We did observe some occasional spikes in upload latency, which we were eventually able to attribute to loading a single object bucket very heavily: RADOS Gateway maintains an index of all the objects in each bucket, and seeks to split this into multiple 'shards' to preserve performance as the bucket grows. Amazon write about similar issues in their S3 object store in this blog post, which advises users to maintain a high object keyspace entropy.
What Comes Next?
We're currently improving our Squirrel architecture to distribute storage load across a number of object buckets. We hope this will help to alleviate the latency spikes we experienced during our performance testing, because we will no longer be loading one bucket (and index) as heavily.
After that, our plan is to create new Cloud-Fit Production capabilities which provide stream input and output 'as a service': purely in software without relying on dedicated, physical IP Studio hardware to provide gateway services. In doing this, we're hoping to find out more about how a current generation IP production facility, which uses streaming technologies such as SMPTE ST 2110 can interact with our Cloud-Fit Production Architecture at scale.
This post is part of the Automated Production and Media Management section