Advertisement
« Previous | Main | Next »

What you don't see and why it's so important...

FM&T Journalism | 10:00 UK time, Friday, 9 January 2009

Mark Drayton, one of our Live Site Systems Administrators explains the technology behind our sites and how we keep the show on the road.

For the most part this blog and the BBC Internet Blog focus how the audience-facing parts of our sites work -- developments in embedded media, real-time Sport updates or new ways to find content. But the bits our audiences can see are, as ever, only half the show: what you don't see is the servers, datacentres and people which host the services you use each day. I thought it might be interesting to give a quick glimpse behind the scenes at some of the technology we use and the challenges running a big, complicated platform presents.

mark_drayton_01.jpg

I'm a Systems Administrator (aka 'sysadmin') in the Live Site team, a group of 5 who look after the platform running (amongst others) the main News and Sport website, the iPlayer front-end, /programmes, Search and the Weather beta. Between us we build, install and maintain several hundred servers spread across various BBC buildings and London Docklands datacentres. We're also building a new hosting centre in Watford where we'll consolidate some of the services currently running on older hardware elsewhere. Almost everything we look after runs Linux, the same open source UNIX-like operating system used by many big web names like Google and Yahoo!

If we managed all of these computers as you would a home PC, manually installing the operating system and configuring all the applications, we'd quickly find that we need more staff and, more importantly, we'd risk making mistakes. It'd also be very, very tedious. Larry Wall, the author of the popular Perl programming language, once wrote that the "three great virtues of a programmer [are] laziness, impatience and hubris". The same is even truer of sysadmins: the essentially repetitive nature of our job (we install or configure things a few hundred times in a row) really lends itself to finding more efficient, faster and elegant ways to do things. For us, this means automation. By automating tasks and running them in parallel we can configure one, ten or hundreds of servers in the same amount of time. As an example, here's how we're building about 100 new servers in our Watford site:

  • physically install the servers in their racks and connect them to the network
  • run a script to tell the lights-out card (a way to manage the server as if you were standing at its keyboard) to boot from a "virtual" CD-ROM which updates the server firmware
  • install the base operating system by 'pixie' booting from a Cobbler server
  • start the Puppet service to configure the server with whatever software and files it needs to do its job.

All of this requires almost no manual intervention and takes about an hour per server, end-to-end. Watford is almost building itself.

It's worth saying a bit more about Puppet. It's a network-based system which allows us to define 'manifests', or collections of resources that should be applied to a server to make it do a particular job -- for instance, a web server might need the web server software, the configuration files that go with it, a specific directory structure to hold the web content and a way of archiving the server log files. Once we've told Puppet which manifests it should apply to which servers it does the rest, installing software and configuring services. It runs all the time, making it easy for us to push out changes and ensure that servers that have been down for maintenance are up-to-date before they're returned to production.

monitorstation.jpg

Of course, being able to make changes to the whole farm at once does mean that any configuration errors will potentially affects many servers instead of a handful. We try to avoid this by testing our changes in a staging environment before going live but occasionally something falls through the gaps. We can go some way to reducing the impact of mistakes by storing all of our configurations (which are text files) in version control so we're quickly able to spot which change broke things and what we need to do to fix it. Going forward, we're planning on borrowing the ideas of smoke testing and build automation from the software development world. When we want to deploy a new version of some software or make a configuration change Puppet will first deploy it to our staging environment and then automatically run a battery of tests to make sure we're not introducing a new problem, or reintroducing something we earlier fixed. Again, automation to the rescue!

So, while servers and operating systems aren't quite as exciting or glamorous as snazzy new technologies like AJAX or embedded video I hope I've gone some way towards showing you that there's much more to our online offerings than you might realise. The problems we face and the solutions we find behind the scenes are just as important and interesting to work on as those taking centre stage!


CommentsSign in

You need to sign in to contribute to this page. If you're new to BBC Blogs, creating your membership is quick and easy.

  • 1. At 1:09pm on 18 Jan 2009, ConcernOfTrueTales wrote:

    Hey!
    I was just browsing the BBC site and found this posting, did not expect to find this subject on this page!.....:))))
    I kind of found it of interest because I have recently been reading or following the C++ programming today second edition book Barbara Johnston and its good to see the background to a live working example.

    Complain about this comment

  • 2. At 5:10pm on 17 Feb 2009, virtuousNettys wrote:

    Hey! Very good to give a glimpse behind the scenes at the technology you use. I appreciate it.

    Complain about this comment

  • 3. At 1:29pm on 22 Feb 2009, rxbld1

    This comment was removed because the moderators found it broke the House Rules.

  • 4. At 10:54am on 25 Feb 2009, Bill-Taylor wrote:

    #3 Is displayed as ? marks.

    Thanks for the post.

    Complain about this comment

  • 5. At 8:27pm on 09 Apr 2009, aricson wrote:

    Is this BBC Internet Blog datacenter for all servers or you have other data centers different from this one. It looks pretty decent!
    Web Hosting Review

    Complain about this comment

  • 6. At 1:34pm on 27 Apr 2009, SHLA2UK wrote:

    A nice glimpse behind the scenes. I work in iT, as a data provider / service provider to the ATI (Air Transport Industry) and appreciate the huge amount of un-sung heroism (ok, a little over the top - but un-sung mainly, all the same) that goes on in support of the customer facing staff and the customer themselves.

    You do remarkably well with only 5 staff.

    Complain about this comment

View these comments in RSS

Explore the BBC

This page is best viewed in an up-to-date web browser with style sheets (CSS) enabled. While you will be able to view the content of this page in your current browser, you will not be able to get the full visual experience. Please consider upgrading your browser software or enabling style sheets (CSS) if you are able to do so.