Research & Development

Posted by Robert Wadge on , last updated

Object-based broadcasting is currently a hot topic in our labs, with several teams looking from different perspectives into how these ideas might be used to provide our audiences with more tailored experiences - there are parallels between object-based broadcasting and what's become known as responsive design.

As part of our work into the advantages of using commodity IT and packet-based networks in TV production, BBC R&D's IP Studio project has been thinking for some time about how the flexibility and power available in a software-based production environment could be employed to author object-based formats in a consistent, scalable way. We're developing a set of interconnected components to test and investigate these concepts which will help the BBC understand the benefits and the challenges of the object-based approach for different formats and genres.

In the world of object-oriented broadcast, a programme is like a multi-dimensional jigsaw puzzle that is sent in pieces and can be reconstructed on-the-fly in a variety of ways just before the programme is presented to the viewer.  The solutions to the puzzle are provided by maps that tell the system where the pieces belong and how to combine them.  Default versions of these maps are sent along with the jigsaw pieces. In some cases the map may be modified by the viewer to create a personalised experience. The map may also be modified by a system of sensors that perceives certain aspects of your relationship to your viewing/listening environment (like this, for example).

Breaking it down

The object-based approach to programme production works best if media is conceived as objects from the start.  To this end we split audio and video elements into separate media flows at the point of entry into the IP Studio.  Where audio and video tracks belong together this relationship is noted, but the audio and video are moved around, processed and stored as separate entities.  This affords us maximum flexibility in how and where in the production/broadcast chain we choose to combine them.


Media objects don't just exist in space (or for that matter in computer memory).  They also have a time dimension.  When we want to present a media object to the audience we must map its time dimension into the time domain of the presentation.  If we want to layer two or more media objects we must also synchronise their presentation.  This is made possible by embedding accurate information about timing relationships into the objects at the point of creation.

Data as Media

Data in an object-based broadcast system is as important as the media itself.  It gives meaning to the media objects and determines their behaviour. Data objects may also have a time dimension.  Imagine an actor walking in from one side of the screen to the centre, delivering her lines.  The dialogue is represented by a mono audio object.  The data object that tracks the actor's position varies over time and is used to place the audio object in the soundstage presented to the viewer.  While video and audio objects are composed from frames, we construct data objects from sequences of Events.

All Objects are Created Equal

Video, audio and data objects have equal importance so they should be treated in a consistent way. To achieve this we've introduced a container called a Grain that wraps a frame of video or audio or a data event. Grains follow one after the other to form Flows.  Each Grain has a nanosecond-resolution timestamp recording the real-world time that it was created, and each Flow has a globally-unique identifier in the form of a UUID.  The Flow's ID is stamped onto each of its Grains.  You might think of Flows as synonymous with the objects we've just been talking about.

A Question of Identity

Each piece in the jigsaw has to have a unique identity so there is no opportunity to select the wrong one.  The identity of objects is provided by a combination of the Flow ID and the origination timestamp.  This gives us not just unique identity for each object, but unique identity for each Grain within an object.  Note that neither timing nor identity is tied to a specific storage container, storage location or format so is equally applicable to streams as it is to files: we can repackage, reformat and move media around at will - its identity down to the Grain level is still the same. This is similar in concept to the Universal Material ID (UMID) standardised as SMPTE ST 330, applied to every Grain in a Flow. 

Things get more complicated when we consider the myriad encoding schemes routinely used to compress video and audio frames. If we want our object-based media broadcast system to be responsive to device type we need to be able to transcode or select between Flows of different encoding, since different devices have different decoding capabilities. Beyond this there is plenty of scope for more advanced ways of being responsive, like reframing the video stream to optimise the experience for screen size and expected viewing distance.  

We have defined an entity called a Source that represents the common source from which video, audio or data emanates in different renditions.  Grains that are different renditions of the same frame of media are stamped with the same Source UUID. In the language of object-oriented design, Source IDs are like base class pointers to the media objects. 


This means that when the client device is not yet known (i.e. at the point of production), we can use maps that refer to the objects and Grains in abstract terms by using the combination of Source UUID and origination timestamp.  Once the client device is known, Source IDs can be resolved into the Flow IDs most appropriate for the device.

We're excited to be laying the foundations for a technology with so many possibilities for the future.  Work on this complex, multi-faceted area is at an early stage; through collaborations with our colleagues in R&D and other BBC creative and technical teams we're looking forward to further developing these ideas and exploring ways they can be used to create new, dynamic media experiences for our audiences.