Building the Knowledge & Learning Beta
Hi I'm Rob Lee, senior technical architect for Knowledge & Learning (K&L) at the BBC.
I'm part of the team building the new Knowledge & Learning product. In this post I'm going to describe the technical work we've undertaken so far, some of which you can see in the new Beta release that my colleague Chris Sizemore recently blogged about.
Video content on the Knowledge & Learning Beta
The Knowledge & Learning product aims to bring together the variety of factual and learning content we have on the BBC into a coherent proposition. In the formal learning space we currently have a number of sites including Bitesize, Skillswise and Class Clips amongst others.
There are tens of thousands of content items across these sites with each site having different mechanisms for publishing, discovering and describing the content it serves.
Rather than approach the development of the new product as a 'big-bang' exercise trying to incorporate everything at once, we've adopted an incremental approach to adding content and building out new functionality using short form audio and video clips as the first content type to surface in the Beta.
This meant the team had a well-defined problem to solve but still had to consider some key challenges inherent in building the new product:
- Multi-device consumption: Many of the existing sites were built before smartphones and tablets became commonplace so a method of delivering content in a multi-device world was required.
- Content management and migration: Existing sites use a collection of content production systems to author content. We needed to rationalise these and define a content migration process.
- Information architecture and content modelling: Each of the existing sites have similar but different ways of describing and navigating to their content. We needed a consistent approach to describe content in the new product.
In each of these areas we tried to build on existing successes and key systems at the BBC, allowing the team to focus on delivering new product features.
The overview diagram above shows the components involved in the initial Beta release. The BBC has a standard platform for dynamic content rendering that follows a multi-tier approach, keeping a clear separation of concerns.
The rendering layer is built on a standard PHP framework (Zend) with BBC extensions for common capabilities and components. Access to the data layer is via Java service components with rendering components never communicating directly with the data layer.
Caching is a key feature of the stack with reverse proxy caches (Varnish) proxying requests through to the PHP page assembly (rendering) layer using standard HTTP caching mechanisms to allow caching of pages.
The service layer has a similar caching arrangement and both rendering and service layers have access to memcached which provides a distributed memory object caching system to applications.
Ensuring the Java service components are performing is key as these are aggregated in the page assembly layer, meaning the effects of a slow service component can be amplified when a series of requests are made from the page assembly layer.
Memcached is useful in this scenario however, we use caching carefully, where it will have the biggest impact as it adds another layer of indirection making failure modes more difficult to diagnose.
We've tried to keep the architecture simple in line with our initial goal for the Beta but with the longer term aims still in mind. I'll talk about some of the components involved in the diagram in the sections below.
Serving content in a multi-device world
The Beta site across different devices
One of the emerging patterns in delivering content to the variety of devices people now use is Responsive Design, a way to build a single fluid experience that works across multiple devices rather than a series of separate disconnected designs.
This required a mobile first approach to design and implementation which was a new way of working for the team. One of the benefits that Responsive Design delivers is that when we build a new feature in the product it's available on release to all devices at the same time rather than having to add a feature for each device class.
We followed the responsive BBC News model of delivering a core experience to HTML4 browsers and an enhanced experience to HTML5 capable browsers that 'cut the mustard'. We use CSS3 media queries to support responsive styling for different viewports (screen sizes) and device orientations which, for example, allows font size on a large-screen tablet in landscape mode to be increased when compared to that on a smaller mobile device.
This has given us a framework for delivering all new Knowledge & Learning content in a responsive way which we can continue to improve and build on. This is still a work in progress and we're looking at ways we can improve our initial content footprint and the way we deliver assets.
We are one of the first collection of sites at the BBC to use the new Responsive Barlesque (ostensibly the header and footer used for every BBC page) called ORB. Using ORB made it easy to deliver a single fluid experience as it follows the same responsive principles as our application.
The responsive approach also makes it easier to use standard platform caching technologies to effectively manage increased site hits during busy times, such as exam revision times, as the product can deliver small cacheable HTML pages to clients with appropriate cache control information. This means the load on platform infrastructure can be kept to a minimum.
Content management and migration
Diagram showing the content management systems and how they interact with the site
Where appropriate we've used standard content production systems within the BBC allowing the team to concentrate on building out new functionality in the product.
For short form audio and video clips we use iBroadcast2 which provides a way to publish short form clips in a standard way so the content is available on as wide a selection of devices as possible when used in combination with the BBC media player (known as emp).
For general content management we use iSite which is a standard content management system within the BBC and is also used to content manage the BBC Internet Blog, see Welcome to the new look Internet blog for more information.
With these core systems in place we set about planning the migration of the 10,000+ short form audio and video clips from the existing class clips site. There were a number of challenges we had to address in order to migrate the video and associated content:
- Clip format: The clips were originally transcoded using systems primarily intended for content consumed on the desktop. This meant finding a way to re-encode the clips in the same formats produced by iBroadcast2.
- Clip content: The existing clips content (e.g. Classroom Ideas) was held in a legacy publishing system and needed to be extracted and placed alongside the clips published in iBroadcast2.
- Clip mappings: The curriculum information held against the existing clips (e.g. its subject or topic) needed to be mapped in the curriculum data model.
A simplified process flow for the clip migration is shown in the figure below:
Simplified clip migration process flow
The migration has three key steps:
- Transcode clips and push into the media playout platform.
- Generate and validate learning metadata and content for clips.
- Transform learning metadata and content inputs and push into the content management system.
The clip transcoding was carried out by the Media Publishing System (MPS) team who ingested the source clips, bulk processed them and added them to the same systems that iBroadcast2 publishes to, with clips transcoded to the same formats. This means in the future all clips (both migrated and newly added) can be editorially managed using iBroadcast2.
The K&L editorial teams then produced a series of controlled vocabularies which were mapped to each clip. The teams also validated the associated content for each clip, amending and rewriting where needed to ensure the content was suitable for the new product.
Finally, a transformation component was produced that took these inputs, generated XML documents, ready for addition to iSite so they could be rendered in the new Beta site.
The curriculum ontology
Following on from the work on Dynamic Semantic Publishing we've adopted a similar approach, moving away from a relational publishing model to one that separates semantics from content and allows for easier re-use.
The existing learning sites have no single model to describe content that could be reused in the new product. The team therefore produced a model of the UK national curricula that allows description of all learning (and other) content in context. This model will contain over 2000 curriculum topics across the various key stages and levels.
The key classes in the model are:
Level: A stage of education e.g. KS3.
Field of Study: A subject discipline of a curriculum e.g. Science.
- Topic: A more specific description of a learning resource in the context of a field of study e.g. Energy.
- Programme of Study: A combination of an educational Level and the Field of Study e.g. KS3 Science.
- Topic of Study: A Topic in the context of a Programme of Study e.g. KS3 Physics Energy.
As demonstrated in the diagram above, the model allows for modelling of complex relationships where both KS3 Physics and KS3 Geography share a common Topic 'Energy'.
We've released this model for others to use which you can find at the curriculum ontology page. The model is licensed under an open license meaning others can reuse and adapt it for their own purposes.
We'd be interested in any feedback or comments from anyone who adopts this model.
This curriculum data is held in the BBC's Linked Data Platform (LDP) expressed as Resource Description Framework (RDF) which will allow a learning context to be applied to many content items produced by the BBC in the future.
We've created an LDP-powered domain-specific API for the curriculum domain model, that powers each of the curriculum pages (e.g. the KS3 Geography Environment page) and the curriculum navigation on the site.
We're also aiming to expose this domain model as structured data in the content. In our next release we'll be adding in-page mark-up from the educational vocabulary in Schema.org which is contributed from the LRMI project.
Schema.org is a "collection of schemas, i.e. HTML tags, that webmasters can use to mark-up their pages in ways recognized by major search providers". LRMI is an education-specific vocabulary, providing a way to consistently describe educational content.
Practically this means that machines (such as search engines) will be able to index and understand more about the content from the structure, aiding it's discovery. We've created mappings between Schema.org and the curriculum ontology allowing us to expose the data in the curriculum ontology in a way that search engines (and others) can easily consume.
This mapping is still a work in progress and will evolve as Schema.org and LRMI evolve but we believe it's a useful way to begin exposing the semantics for learning content at the BBC
We're currently working on adding more clips to the Beta site to give further coverage across the UK curricula. We're aiming to add localisation support for UK indigenous languages as well as adding new content formats to the site.
If you have any thoughts or queries that you’d like to feedback then please leave a comment below or email us at email@example.com.
Robert Lee is senior technical architect for BBC Knowledge & Learning.