BBC Online Outage on Saturday 19th July 2014

Tuesday 22 July 2014, 20:12

Richard Cooper Richard Cooper Controller Digital Distribution

Tagged with:

Hi, I'm Richard Cooper, the BBC's Controller of Digital Distribution for BBC Future Media.

As many of you will have noticed we suffered a serious incident over the weekend which impacted BBC iPlayer, BBC iPlayer Radio, and audio and video playback on other parts of bbc.co.uk. We also had to use our emergency homepage for prolonged periods of time.

Here’s what happened.

We have a system comprising 58 application servers and 10 database servers that provides programme and clip metadata. This data powers various BBC iPlayer applications for the devices that we support (which is over 1200 and counting) as well as modules of programme information and clips on many sites across BBC Online. This system is split across two data centres in a "hot-hot" configuration (both running at the same time), with the expectation that we can run at any time from either one of those data centres.

At 9.30 on Saturday morning (19th July 2014) the load on the database went through the roof, meaning that many requests for metadata to the application servers started to fail.

The immediate impact of this depended on how each product uses that data. In many cases the metadata is cached at the product level, and can continue to serve content while attempting to revalidate. In some cases (mostly older applications), the metadata is used directly, and so those products started to fail.

At almost the same time we had a second problem. We use a caching layer in front of most of the products on BBC Online, and one of the pools failed. The products managed by that pool include BBC iPlayer and the BBC homepage, and the failure made all of those products inaccessible. That opened up a major incident at the same time on a second front.

Our first priority was to restore the caching layer. The failure was a complex one (we’re still doing the forensics on it), and it has repeated a number of times. It was this failure that resulted in us switching the homepage to its emergency mode (“Due to technical problems, we are displaying a simplified version of the BBC Homepage”). We used the emergency page a number of times during the weekend, eventually leaving it up until we were confident that we had completely stabilised the cache.

Restoring the metadata service was complex. Isolating the source of the additional load proved to be far from straightforward, and restoring the service itself is not as simple as rebooting it (turning it off and on again is the ultimate solution to most problems). Performance of the system remained sufficiently poor that in the end we decided to do some significant remedial work on Saturday afternoon, which ran on until the evening. During that period, BBC iPlayer was effectively not useable.

After that work was complete we were in a walking wounded state that allowed close to normal operation for much of the site, though BBC iPlayer remained down on a number of devices. We chose to run it in this mode throughout the rest of the weekend while planning a full restoration of the service. By the time we were ready to do that we were entering the peak period on Sunday evening, so rather than risk the service further, we chose instead to do it on Monday morning.

We recognise that during this incident, with BBC iPlayer unavailable for some periods for some users you may not have been able to watch or listen to the programmes you wanted. I’m afraid we can’t simply turn back the clock, and as such the availability for you to watch some programmes in the normal seven day catch-up window was reduced. Essentially programmes aired on Saturday 12th July and Sunday 13th July were not available this last weekend for some users. It's small consolation but that was the weekend of the World Cup Final, Scottish Open, Women's Open and other live sporting events which are less likely to be viewed on catch-up. I should also stress that programmes aired this weekend - when the problems occurred - are available now on BBC iPlayer.

BBC iPlayer is an incredibly popular product, last year alone we had 3 billion requests and instances like this are incredibly rare.

We will now be completing the forensics to make sure that we’ve fully understood the root causes, and put in place the measures necessary to minimise the chances of such interruptions in the future.

We're sorry for the inconvenience.

Richard Cooper is Controller of Digital Distribution, BBC Future Media

Tagged with:

Comments

Jump to comments pagination
 
  • rate this
    +2

    Comment number 1.

    Will the missing things be added eventually?
    BBC Radio 1's Dance Anthems from Saturday is still not available via iPlayer.

  • rate this
    +10

    Comment number 2.

    The BBC's Annual Report, coincidentally published yesterday, states that the BBC needs to develop "effective monitoring and forensic capabilities" for the IT systems to reduce risks to content delivery (page 85).

    Why - in 2014 - does the BBC not already have effective network monitoring? Is the lack of forensic capabilities not going to hamper your investigations, as you repeatedly mention that forensic investigation is essential to finding the root cause?

    Remember that the insufficiency of these capabilities is highlighted as a key risk to the BBC in the Annual Report - for the second year running!

    Or are you going to claim that there were effective capabilities in both these areas, and the Annual Report is a misleading document?

  • rate this
    +12

    Comment number 3.

    Next time something like this happens can we have a better error message than the somewhat generic one you put out.

    On and off I probably wasted an hour or so on Sunday evening wondering whether it was our setup and I'm sure I'm not the only one.

  • rate this
    -5

    Comment number 4.

    Some friends in Canada like to listen to Radio 4 comedies and were unable to do so. They are now relieved that they can receive them again.

  • rate this
    +2

    Comment number 5.

    Hey just checked out the Homepage - there is less of it - it's the best thing that has happened to the Homepage in ages.

 

Comments 5 of 134

 

This entry is now closed for comments

Share this page

More Posts

Previous
Testing BBC iPlayer Mobile App

Tuesday 22 July 2014, 13:04

Next
Connected Studio: Coding for Teenagers Build Studio

Thursday 24 July 2014, 10:58

About this Blog

Staff from the BBC's online and technology teams talk about BBC Online, BBC iPlayer, BBC Red Button and the BBC's digital and mobile services. The blog is reactively moderated. Your host is Nick Reynolds.

Blog Updates

Stay updated with the latest posts from the blog.

Subscribe using:

What are feeds?

Links about BBC Online

BBC Internet blog Archive

owl-plain-112.jpg 2012 ι 2011 ι 2010 ι 2009 ι 2008 ι 2007

Tags for archived posts