BBC News website's content management and publishing systems
As part of the BBC News site refresh we have been making substantial changes to the underlying systems that manage and publish the content.
The BBC has one of the oldest and largest websites on the internet and one of the goals of the update to the News site was to also update some of the core systems that manage content for all our interactive services.
In this post I'll highlight a few areas where we have made some important changes.
The CPS is the system that manages content production for BBC News, BBC Sport and over 100 other websites across the BBC. It also produces the content for multi-platform journalism, such as the BBC Mobile services and Interactive TV/Red Button services and even content for the mighty Old Skool Ceefax is born in the CPS.
If the BBC website is one of the largest and oldest on the internet, then the CPS has been around nearly as long. As a rule of thumb, if you remember Bagpuss you are older than the CPS. If you grew up watching Teletubbies, you probably are not.
Let's not confuse old with legacy though. The CPS has been constantly evolving and we should say, that when looking at the requirements for the new News site and other services, we did consider whether we should take a trip to the Content Management System (CMS) Showroom and see what shiny new wheels we could get.
However there is an interesting thing about the CPS - most of our users (of which there are over 1,200) think it does a pretty good job [checks inbox for complaints]. Now I'm not saying they have a picture of it next to their kids on the mantelpiece at home, but compared to my experience with many organisations and their CMS, that is something to value highly.
The latest version of the CPS - version 6 - underpins the new News site and has made substantial changes to systems and workflow, but it is still focused on the task of managing content which fits into a general journalistic pattern. It does not try to be all things to all people, and this in no doubt plays some part in its success.
There have been a number of requests from people asking to see more of the CPS but as there is a lot of detail to go into, I'll just focus on a few headline points for now. We will be doing a more in depth blog post on it soon.
Moving to a more structured approach
Some of the major changes in approach are in the Client which is a .NET 3.5 client, taking full advantage of WPF. The screenshot below shows an example image from the CPS which illustrates some new features.
This shows a snapshot of a story editing window. Around this are site navigation and other tools (like Search).
As you can see there is a component based structure to the story content with a Video, Introduction and Quote shown. These components are predefined and can be dragged in and added to the story showing that the CPS is not primarily a WYSIWYG editor. The CPS focuses on content structure because in a world where you are publishing to many platforms that have hugely different rendering possibilities WYSIWYG becomes a pointless feature but there are previews showing the output.
Previously, users could add HTML and Custom CPS tags directly into the story body to control the content presentation and the components, similar to the way you would insert code into your content on Wikis and Blogs. This causes a lot of problems for quality and content structure though, so now these things are managed as components where the user can change the content and behaviour of the component in a controlled manner. We will come on to the importance of that next.
Another part of the CPS that changed considerably was the way content is published. Requirements over the years have caused features to be added organically to the way content is published, leaving it a bit messy with a lot of layout based on HTML tables. A key goal here was to improve the technical quality of content produced and support standards as we move from <!DOCTYPE html PUBLIC "-//W3C//DTD XHTML 1.0 Transitional//EN" "http://www.w3.org/TR/xhtml1/DTD/xhtml1-transitional.dtd"> to <!DOCTYPE html PUBLIC "-//W3C//DTD XHTML 1.0 Strict//EN" "http://www.w3.org/TR/xhtml1/DTD/xhtml1-strict.dtd">
For example, we are aiming for fully valid pages to be published based on the W3C Validation Checker.
If you look at some of the older pages published you will see they don't pass this test, and some pages, such as http://news.bbc.co.uk/sport2/hi/olympic_games/default.stm, produce a lot of errors:
This is especially tricky to fix where the CPS is pulling in content from other systems or services which don't comply with these standards, but though there is still some work to do here, generally we should be down to 0 or very few errors now.
We will also no longer be using tables to layout the content, instead we will be rendering the pages using CSS layout and only using tables for data.
There are lots of reasons to do this, but some include making the content more efficient, more standards compliant and faster to render. It also allows us to publish semantic XHTML, which means that content blocks are better marked up to describe what they are and has benefits like creating a better header structure to help screen readers.
Better structure also means you will see a more consistent presentation of stories in Google and search engines with, for example, story dates and author information showing more clearly.
This reflects a new content model which is now largely based around a simple and generic data model of assets and groups of assets which are typed (meaning we don't just manage blocks of content, we use metadata to describe what is in the blocks of content) and publishing through templates and services based around Velocity.
Take this example showing how a component is put together.
Previously the HTML would have looked something like this:
But now it is much more structured and would look something like this with headers clearly marking out the sections of content:
Using CSS for layout also makes a big difference to our HTML and makes for a better separation of layout and content. This rather messy layout...
The table elements used in the first example are gone and the layout relies on CSS to manage the positioning of content.
Finally a quick note on the change in our URL structure where you may have noticed a couple of significant changes. These are the tip of an iceberg of substantial changes we have made to our networks and infrastructure also part of this relaunch.
The first is that our News URLs have moved from the http://news.bbc.co.uk to http://www.bbc.co.uk/news/ in order to consolidate our domains. As part of the News site changes this involved us making significant updates to our networking infrastructure to allow better sharing of content across our domains. Moving all our URLs onto the http://www.bbc.co.uk/ domain also consolidates some differences which are there largely for reasons no longer necessary.
All URLs should redirect to the appropriate place, but if you do find any broken URLs please let us know.
We also wanted to simplify our URL structure removing much of the baggage in the previous structure for managing different types of content and editions of the website.
Now the structure is basically:
http://www.bbc.co.uk/ [SITE] / [SECTION] - [SUBSECTION] - [STORY-ID]
This has made URLS shorter and simpler.
We considered making even shorter URLs - you will have seen some stories were published this way while we transitioned the site to the new design, such as:
The changes we have made will allow us to make URLs more flexible, and there is more work to do yet on how we might use even shorter URLs (such as http://www.bbc.co.uk/10250603) and longer more descriptive ones http://www.bbc.co.uk/story-about-something-interesting.
If you would like to know more about any of this, then let me know by leaving a comment.
Thanks for reading.
John O'Donovan is Chief Technical Architect, Journalism and Knowledge, BBC Future Media & Technology.