« Previous | Main | Next »

BBC Weather: changes to technical architecture

Post categories:

Jeremy Tarling | 17:00 UK time, Wednesday, 14 December 2011

The BBC Weather website has recently been relaunched after a public beta. Peter Deslandes has blogged about the public beta and response, and Mel Seyer has explained the UX journey.


The technical architecture for the BBC Weather site in a simplfied diagram

The technical architecture for the BBC Weather site in a simplfied diagram

I work as a Technical Architect with the Weather team, and this post talks about some of the changes we've made to the architecture of the site to ensure the new site stays reliable, performant and able to scale to the traffic levels that BBC Weather attracts

Weather as a service

The previous version of the BBC Weather website ran on a dedicated two-tier architecture, with business and presentation logic wrapped up in a PHP front-end that communicated directly with a MySQL database. These machines sat behind the BBC News Apache mod_cache head-end servers, providing the scaling necessary for Weather traffic spikes.

For the new site we've moved to a three-tier architecture, keeping presentation logic in a PHP front-end but moving business logic and DB access down to a Java/Spring mid-tier service layer that presents RESTful HTTPS APIs back to the PHP front-end for data reads, and to the BBC Weather Centre's data ingest system for writes.

This move is part of the BBC's wider strategy to move to a Service Oriented Architecture (SOA) to increase cross-product interoperability and data reuse. The diagram at top left gives an idea of how the tiers relate to each other in Weather's case (in practice this arrangement is replicated over two data centres)

The mid-tier REST API makes data available to the presentation layer as JSON, providing separate feeds for different page components. We set different Cache-control:max-age headers for different feeds according to how frequently the data is updated, and combine this with HTTP eTags to make subsequent requests more lightweight when the feed's max-age has been reached.

Scaling

The new Weather site now runs on the same dynamic platform that hosts the BBC Homepage and iPlayer. To protect this shared platform from spikes in Weather traffic (up to six million users/per day when it snows, and around 2 million on a typical day) and provide a responsive user experience we've introduced a multi-tier caching strategy:

  • the Varnish HTTP accelerator caches fully rendered pages in front of the PHP tier;
  • Memcached stores JSON within the PHP tier and reduce calls to the Java service layer
  • Apache mod_cache in in front of the Java service layer to cache data requested by the PHP tier and other clients
  • Ehcache within the Java tier to cache database and third-party service responses

Weather pages leaving the PHP tier will typically carry 'Cache-Control: public, max-age=180, stale-while-revalidate=30' to enable caching in Varnish (and beyond) for UK users. The total caching time across the three tiers is around 10 minutes, to enable staff at the Weather Centre to make edits to the data and the public see them on the live environment soon after. As well as protecting the shared platform from load spikes the front-end Varnish caches also provide users with a highly responsive experience.

When the shared platform is under very high load (for example during high profile news events or when it snows) we fail over to a Content Delivery Network (CDN), forcing pages to be cached outside of the BBC's servers. In previous versions of the Weather website this led to a loss of personalisation (favourite locations stored in a cookie) but in this release we handle personalisation on the client-side, using XHR to make follow-up requests once the basic page has loaded and the favourite locations cookie has been read; both the basic page and the location data fragments are cacheable in the CDN

Location data

The previous BBC Weather site used its own location gazetteer, one of several location datasets around the BBC. For the new site we've moved to using a new mid-tier service for location data. This new service (called Locator) lets front-end products get data from a REST API that the BBC has associated with that place (weather forecast ID, TV and radio region, local news area, etc). This service is now used by the new Weather site and the new BBC Homepage, and will soon be used by other BBC products that need to associate data with a place.

Locator draws its gazetteer from the open Geonames dataset. As a result the URL for a location on the new BBC Weather website will be /weather/:geoID, so for example Belfast has the BBC weather forecast page URL www.bbc.co.uk/weather/2655984. I was cautious about using these geoIDs as they aren't web-scale identifiers, plus anyone who's tried it will tell you that managing 3rd-party IDs can be a headache. But for the 20,000 or so populated places that the BBC provides forecast data for they will make it easier for us to integrate with other BBC (and non-BBC) services, using the geoID as a link - for example the BBC's semantic data publishing platform used in the World Cup will use geoIDs as identifiers for locations in its event models.

Behaviour Driven Development

Another key aspect of the new site development was a focus on code quality. We were fortunate to have had a Developer In Test embedded in the team for the first few months of the project, who helped get the developers up to speed with the practices of Behaviour Driven Development and worked with the Product owner to describe the requirements as testable Cucumber features and scenarios.

We used Hudson for Continuous Integration, running unit tests on code commits from our integration environment. Cucumber tests run at scheduled intervals on our testing environment so we had a good chance of catching integration bugs quickly. Code review happened through a combination of frequent pair-programming and scheduled review sessions, where developers talked through what they had been working on with each other.

I have no doubt that these practices helped the project to deliver to schedule and specification, and have provided a more stable, maintainable and extendable codebase.

Jeremy Tarling is Technical Architect, BBC Weather, BBC News & Knowledge

Comments

  • Comment number 1.

    Really interesting article Jeremy. It's great how the BBC share their experiences with techniques such as BDD.

    Now that the project is complete, will the team disband and go on to work on other areas; or will they stay together and support further enhancements to the site?

    I was also wondering whether the final architecture you describe evolved during the course of the project or was it mainly decided up front?

  • Comment number 2.

    Nice gobbildygook in the blog above. However the new weather page now has less info and is less easy to use and looks cluttered. With too much white space which gives you a a headache. The only reason to go there now is watch the video forecast. It is really worrying that this type of poor design now seems to be permeating through all the BBC web sites. I think a new design team has to be found before the BBC's web presence goes further down the tubes.

  • Comment number 3.

    Interesting post. Out of curiosity, are the Java services available externally for developers to play with? Or are they only for BBC sites using Forge?

    Also, you mentioned that you're using Hudson. Most people I've talked to recently seem to have switched to Jenkins (including where I work). Is there a particular reason you chose Hudson over Jenkins? (To be honest, I don't really know much about their differences other than the whole Oracle vs Open source community disagreement)

  • Comment number 4.

    Really interesting to learn that big organizations like BBC uses BDD in their development. I've never user Cucumber for Java - only for Ruby apps.

  • Comment number 5.

    I'm not exactly sure what the purpose of this blog is apart from providing a shopping list of technologies used. Maybe some justification about why the methodologies where used would be more appropriate? I would certainly be more interested about the “whys” rather than a focus on the “whats”.

  • Comment number 6.

    @tinman9898 and @glowin

    Thanks for your comments.

    The technical architecture will be of more interest to other professionals in the field. The design was the subject of Melanie Seyer's blog post, and is off-topic here.

    Cheers,

    Ian

  • Comment number 7.

    Thanks for the comments.

    > 1. At 21:38 14th Dec 2011, Jason wrote:
    > Now that the project is complete, will the team disband and go on to work on
    > other areas; or will they stay together and support further enhancements to the
    > site? I was also wondering whether the final architecture you describe evolved
    > during the course of the project or was it mainly decided up front?

    Hi Jason, no the team will stay together to work through the backlog of features, there's a lot still to do. Some fundamental architectural decisions were made up front but we tried to avoid too much up-front design, preferring early performance testing to guide the process (e.g. trying different caching set-ups and timings).

    > 3. At 00:38 15th Dec 2011, lucas42 wrote:
    > Interesting post. Out of curiosity, are the Java services available externally for
    > developers to play with? Or are they only for BBC sites using Forge?
    > Also, you mentioned that you're using Hudson. Most people I've talked to recently
    > seem to have switched to Jenkins (including where I work). Is there a particular
    > reason you chose Hudson over Jenkins?

    Hi lucas42, we are currently working on making the feeds available externally too, initially as RSS to replicate the previous site's functionality but hopefully in other formats too. The choice of CI tool was made by my colleagues in the Platform Engineering team, they may wish to comment on that.

    > 5. At 10:25 15th Dec 2011, glowin wrote:
    > I'm not exactly sure what the purpose of this blog is apart from providing a
    > shopping list of technologies used. Maybe some justification about why the
    > methodologies where used would be more appropriate? I would certainly be more
    > interested about the “whys” rather than a focus on the “whats”.

    Hi glowin, the purpose was to provide some information about how the architecture differs from the previous site, and some insight in to the BBC's technical direction around SOA. If the Editor's ok with it I'd be happy to write a follow-up piece around the decision-making process we went through.

  • Comment number 8.

    @Ian McDonald

    Although Melanie Seyer's blog post is very informative it doesn't go into any detail about the implementation choices such as SOA and the justifications behind why they were chosen, which I had hoped that this blog would have done. In my opinion this amounts to this blog post being pretty pointless for that reason.

  • Comment number 9.

    Interesting post, good to learn the technology choices the BBC makes in its approach to scaling.

    You mention integrating with other services via GeoID, does the BBC plan to make weather forecast data available as Linked Data?

  • Comment number 10.

    This comment was removed because the moderators found it broke the house rules. Explain.

  • Comment number 11.

    Hi dnelisen49

    > You mention integrating with other services via GeoID, does the BBC plan
    > to make weather forecast data available as Linked Data?

    we'll certainly look at this; perhaps not full RDF representations of forecast data but something like a minimal set of RDFa or microformat data that linked (for example) a five day forecast fragment for a given location to the associated Geonames URI.

 

More from this blog...

BBC iD

Sign in

BBC navigation

BBC © 2014 The BBC is not responsible for the content of external sites. Read more.

This page is best viewed in an up-to-date web browser with style sheets (CSS) enabled. While you will be able to view the content of this page in your current browser, you will not be able to get the full visual experience. Please consider upgrading your browser software or enabling style sheets (CSS) if you are able to do so.