CPS Vivo: a new content management system for BBC streams
Principal Software Engineer
I am Principal Software Engineer within the CPS team, part of BBC Future Media’s Platform department.
This post is the first in a short series about a new content management system that we are building.
CPS is the BBC’s online journalism content management system and has been existence, in some form or other, since 1997. It provides editorial staff the ability to create and promote bespoke content for many BBC sites, most notably BBC News and BBC Sport.
It is a very sophisticated tool and compares well against similar systems in its field. However, like any system of its age there are some limitations that we needed to address as requirements and editorial workflows evolved.
OS tie in:
As a Microsoft .net WPF desktop client application CPS requires that users run Windows within the BBC domain. This is obviously very limiting for journalists who are out in the field and want to file some copy on ever more varied devices and operating systems.
BBC has many bureaus around the world. However, a single Oracle database in the UK means that all network traffic is subject to the latency and bandwidth concerns of data flowing in and out of London.
Manual content discovery:
BBC has an excellent linked data platform that CPS uses to tag its content, but CPS does not use this to assist editorial staff in discovering the wealth of content around the BBC that may be relevant to a story being written.
Changing editorial requirements have prompted a move towards streams of tag-driven content over the traditional long-form article. For example BBC Live pages, Election pages, mobile app topic pages, storylines, live blogging, listicles. Although CPS supports the creation of such content, it is very much an add-on to its traditional functionality, so is a far from ideal workflow.
All of these factors combined led us develop a new application to sit alongside CPS, codenamed Vivo, which is the subject of this series of posts.
Introducing CPS Vivo
Beta screenshot of CPS Vivo, a new content management system for BBC
The principle behind Vivo is to build a lightweight responsive single page web application that is built on the cloud to support three key editorial workflows for stream management.
With the following definitions:
"Stream": a time-ordered list of editorial content about a topic,
"Content Creation": The ability to create, edit and publish shareable short-form tagged content for use in a stream.
"Content discovery": Surface pan-BBC content tagged in linked data platform for potential inclusion in a stream.
"Stream curation":The ability to manually select content from multiple sources for inclusion in a stream.
The development of Vivo is several months old and we are now at the point where we have released a minimum viable product (MVP) to live. User trials have so far been very successful and we are planning its live use in BBC Local Live pages, and sport commentaries in March. With the aim of having it fully rolled out in time for the general election in May.
CPS Vivo Technical Design
CPS Vivo Architecture diagram. CPS built components in blue
Above is a high-level architecture diagram to show the various components that make up the full Vivo solution. Many of these components already exist within BBC Future Media, but are included in the diagram to provide context for the system.
It may at first glance appear to be a complicated looking architecture, but using the principles of Micro Services, is actually a very flexible architecture, allowing us to build independent services to do a single thing well rather than a single bloated service that attempts to do everything.
Here’s what they all do:
Cloud hosting a web client solves the issues of OS tie-in and provides the distributed architecture necessary to alleviate bureau connectivity issues.
The Vivo client, once validated against our authentication service, interacts exclusively with a newly created RESTful Vivo API component for all operations.
The authentication service, written using Scala/Scalatra, is a JSON API that fronts ADFS and holds sessions as state and synchronises between nodes using Akka clustering. This service allows us to provide access to BBC staff, including those in the field.
Also written in Scala, using the Play framework, the Vivo API is where the main business logic for Vivo content creation resides.
Vivo API performs CRUD operations against a MongoDB document store for draft content (known as posts), saves images to a private S3 bucket, proxies calls to the Linked Data API for tag lookup, and initiates publication via the Content Publisher.
The Content Publisher, again written in Scala, transforms the stored Vivo JSON representation of the post into the unified BBC Content Store XML format (known as CandyXML) using the shared Content Writer service. It also copies images from the private S3 bucket to its public equivalent, applies tags to the post using the Linked Data Writer API, and initiates an SNS message to notify SQS subscribers of this change.
Once the post is in the Content Store, it is available to be dynamically rendered alongside other content in any BBC product using the Content API.
When it comes to the areas of content discovery and stream curation things get considerably more complicated, and I will discuss these complexities in a future post. However, these functions are the domain of the Curation API.
Built in Scala, using Spray, and utilising Redis as a cache of merged stream data, Curation API facilitates discovery by identifying content related to a given tag or Linked Data platform query, and enables manual curation of those items into streams of content, which are subsequently served to BBC products.
In the next post, I will describe how the content is modelled and I will talk through the complex issues around curation API and how they were solved.