BBC Weather: changes to technical architecture
The technical architecture for the BBC Weather site in a simplfied diagram
I work as a Technical Architect with the Weather team, and this post talks about some of the changes we've made to the architecture of the site to ensure the new site stays reliable, performant and able to scale to the traffic levels that BBC Weather attracts
Weather as a service
The previous version of the BBC Weather website ran on a dedicated two-tier architecture, with business and presentation logic wrapped up in a PHP front-end that communicated directly with a MySQL database. These machines sat behind the BBC News Apache mod_cache head-end servers, providing the scaling necessary for Weather traffic spikes.
For the new site we've moved to a three-tier architecture, keeping presentation logic in a PHP front-end but moving business logic and DB access down to a Java/Spring mid-tier service layer that presents RESTful HTTPS APIs back to the PHP front-end for data reads, and to the BBC Weather Centre's data ingest system for writes.
This move is part of the BBC's wider strategy to move to a Service Oriented Architecture (SOA) to increase cross-product interoperability and data reuse. The diagram at top left gives an idea of how the tiers relate to each other in Weather's case (in practice this arrangement is replicated over two data centres)
The mid-tier REST API makes data available to the presentation layer as JSON, providing separate feeds for different page components. We set different Cache-control:max-age headers for different feeds according to how frequently the data is updated, and combine this with HTTP eTags to make subsequent requests more lightweight when the feed's max-age has been reached.
The new Weather site now runs on the same dynamic platform that hosts the BBC Homepage and iPlayer. To protect this shared platform from spikes in Weather traffic (up to six million users/per day when it snows, and around 2 million on a typical day) and provide a responsive user experience we've introduced a multi-tier caching strategy:
- the Varnish HTTP accelerator caches fully rendered pages in front of the PHP tier;
- Memcached stores JSON within the PHP tier and reduce calls to the Java service layer
- Apache mod_cache in in front of the Java service layer to cache data requested by the PHP tier and other clients
- Ehcache within the Java tier to cache database and third-party service responses
Weather pages leaving the PHP tier will typically carry 'Cache-Control: public, max-age=180, stale-while-revalidate=30' to enable caching in Varnish (and beyond) for UK users. The total caching time across the three tiers is around 10 minutes, to enable staff at the Weather Centre to make edits to the data and the public see them on the live environment soon after. As well as protecting the shared platform from load spikes the front-end Varnish caches also provide users with a highly responsive experience.
When the shared platform is under very high load (for example during high profile news events or when it snows) we fail over to a Content Delivery Network (CDN), forcing pages to be cached outside of the BBC's servers. In previous versions of the Weather website this led to a loss of personalisation (favourite locations stored in a cookie) but in this release we handle personalisation on the client-side, using XHR to make follow-up requests once the basic page has loaded and the favourite locations cookie has been read; both the basic page and the location data fragments are cacheable in the CDN
The previous BBC Weather site used its own location gazetteer, one of several location datasets around the BBC. For the new site we've moved to using a new mid-tier service for location data. This new service (called Locator) lets front-end products get data from a REST API that the BBC has associated with that place (weather forecast ID, TV and radio region, local news area, etc). This service is now used by the new Weather site and the new BBC Homepage, and will soon be used by other BBC products that need to associate data with a place.
Locator draws its gazetteer from the open Geonames dataset. As a result the URL for a location on the new BBC Weather website will be /weather/:geoID, so for example Belfast has the BBC weather forecast page URL www.bbc.co.uk/weather/2655984. I was cautious about using these geoIDs as they aren't web-scale identifiers, plus anyone who's tried it will tell you that managing 3rd-party IDs can be a headache. But for the 20,000 or so populated places that the BBC provides forecast data for they will make it easier for us to integrate with other BBC (and non-BBC) services, using the geoID as a link - for example the BBC's semantic data publishing platform used in the World Cup will use geoIDs as identifiers for locations in its event models.
Behaviour Driven Development
Another key aspect of the new site development was a focus on code quality. We were fortunate to have had a Developer In Test embedded in the team for the first few months of the project, who helped get the developers up to speed with the practices of Behaviour Driven Development and worked with the Product owner to describe the requirements as testable Cucumber features and scenarios.
We used Hudson for Continuous Integration, running unit tests on code commits from our integration environment. Cucumber tests run at scheduled intervals on our testing environment so we had a good chance of catching integration bugs quickly. Code review happened through a combination of frequent pair-programming and scheduled review sessions, where developers talked through what they had been working on with each other.
I have no doubt that these practices helped the project to deliver to schedule and specification, and have provided a more stable, maintainable and extendable codebase.
Jeremy Tarling is Technical Architect, BBC Weather, BBC News & Knowledge