BBC Online Outage on Tuesday 29 March 2011
As many of you will have noticed (and reported on Twitter) the whole of BBC Online was down last night for an hour from 22:40 due to a major network incident. We would like to apologise to everyone that was unable to access BBC Online during this outage.
Our systems are designed to be sufficiently resilient (multiple systems, and multiple data centres) to make an outage like this extremely unlikely. However, I'm afraid that last night we suffered multiple failures, with the result that the whole site went down. Enough of the systems were restored to bring BBC Online pretty well back to normal by 23:45, and we were fully resilient again by 04:00 this morning.
For the more technically minded, this was a failure in the systems that perform two functions. The first is the aggregation of network traffic from the BBC's hosting centres to the internet. The second is the announcement of 'routes' onto the internet that allows BBC Online to be 'found.' With both of these having failed, we really were down!
We'll be taking a very hard look at what we need to do to make sure that this doesn't happen again.
Richard Cooper is Controller, Digital Distribution, BBC Future Media.
Comment number 1.
At 10:05 30th Mar 2011, Kit Green wrote:Outside attack?
Complain about this comment (Comment number 1)
Comment number 2.
At 10:18 30th Mar 2011, Dougie wrote:Are you also looking at having secondary offsite DNS servers.
I couldn't even get an non-authorative reply to a `dig bbc.co.uk`. Your DNS was offline (probably due to your routing failure).
You've got four name servers
ns1.bbc.co.uk. 710 IN A 132.185.132.21
ns1.thdo.bbc.co.uk. 49979 IN A 212.58.224.21
ns1.thls.bbc.co.uk. 49979 IN A 132.185.240.21
ns1.rbsov.bbc.co.uk. 49979 IN A 212.58.227.48
but only two networks hosting them.
Complain about this comment (Comment number 2)
Comment number 3.
At 10:19 30th Mar 2011, dr_salleee wrote:*29th* March...
Complain about this comment (Comment number 3)
Comment number 4.
At 10:20 30th Mar 2011, Richard wrote:Do you mean the headline "BBC Online Outage on Tuesday 23rd March 2011" when the article was written on the 30th and mentions "nast night" and "yesterday"? Maybe the BBC Online Server(s)'s clock was also reset?
Complain about this comment (Comment number 4)
Comment number 5.
At 10:28 30th Mar 2011, brendan wrote:@dr_salleee, @Richard: give the guy a break, I don't think he got much sleep last night :-/
Complain about this comment (Comment number 5)
Comment number 6.
At 10:47 30th Mar 2011, dr_salleee wrote:@brendan :) I won't mention that the article initially even missed out the word March, then!
Complain about this comment (Comment number 6)
Comment number 7.
At 10:55 30th Mar 2011, Nick Reynolds wrote:Thanks for your support Brendan. It was of course the 29th (i.e. last night) - my error which I am now endevouring to correct.
Apologies
Complain about this comment (Comment number 7)
Comment number 8.
At 11:16 30th Mar 2011, Synchronium wrote:Beneath this calm blog post, I'm picturing someone receiving a proper Malcolm Tucker style bollocking. Please tell me that's the case?! Hearing everything's fine, all smiles and chamomile tea would be somewhat disappointing...
Complain about this comment (Comment number 8)
Comment number 9.
At 11:22 30th Mar 2011, Macin Tosh wrote:The BBC News report on this ( http://www.bbc.co.uk/news/technology-12904586 ) keeps being updated, which is nice. I rather liked the first version I read, which hinted that the technical solution deployed was to switch the BBC off, wait 10 seconds, then switch it back on again. This colourful little detail vanished from later versions.
Curiously however, the 'Last updated' time remains ever fixed at 08:51 through all the several updates to the report.
Complain about this comment (Comment number 9)
Comment number 10.
At 12:00 30th Mar 2011, MamaGlide wrote:Synchronium - I agree. I'm going to keep that image in my mind no matter where the truth lies. I still appreciate an honest blog post with comments like this.
Complain about this comment (Comment number 10)
Comment number 11.
At 12:55 30th Mar 2011, flapbat wrote:My first port of call when it all went down was to check News 24. I thought there might be an explanation there, or at least an acknowledgment on the on-screen ticker.
I don't know if this is true but I would expect that the BBC's online news reaches far more millions of people than the BBC's TV news these days.
Have we not reached a point where TV should in some instances serve the web? What I mean is, if BBC TV went down, there would be an instant announcement on the website. Why not the other way round??
Complain about this comment (Comment number 11)
Comment number 12.
At 13:02 30th Mar 2011, Zal wrote:Let's all get over it.
It's a website. It went down for a couple of hours. Bad things happen. The world keeps turning and, thankfully, the BBC isn't as important as it thinks it is.
Complain about this comment (Comment number 12)
Comment number 13.
At 13:11 30th Mar 2011, Jolly Rodger wrote:have to laugh at the dip stick below giving you technical advice. LOL.
Complain about this comment (Comment number 13)
Comment number 14.
At 13:28 30th Mar 2011, udp53 wrote:Besides which he is wrong, the nameservers are on four different networks but being announced from the same Autonomous System.
Complain about this comment (Comment number 14)
Comment number 15.
At 13:53 30th Mar 2011, Guy wrote:iPlayer is looking different and behaving oddly (as in showing as coming up next programmes that have already been broadcast). Either it's a deliberate revamp (inc glitch) or a spin-off from the outage?
Complain about this comment (Comment number 15)
Comment number 16.
At 14:11 30th Mar 2011, Parax wrote:"announcement of 'routes' onto the internet that allows BBC Online to be 'found'." What is that? how about in interweb speak, does it mean DNS? I fear the BBC is heading for 'series of tubes' moment (google it).
Say what it is! don't explain it with a personal understanding. If you say what it is and peope don't understand they can reserch and learn or at least type it into google, if you write nonsense like this people just have to guess what you mean.
You(BBC) wouldn't replace the word 'orange' with 'a colour half way between yellow and red' so please don't do it with technology. (My point is, it's easier to look up 'orange', than 'a colour half way between yellow and red', especially so if your first language is not english.)
Complain about this comment (Comment number 16)
Comment number 17.
At 14:28 30th Mar 2011, Mo McRoberts wrote:@Guy (#15) I *think* the outage may have had a knock-on effect upon some of the processes which feed data into the systems which drive iPlayer — it should catch up after a while.
Complain about this comment (Comment number 17)
Comment number 18.
At 14:31 30th Mar 2011, Momo wrote:Is this related to the fact that After Midnight with Linley Hamilton which went out at 12:05 am Monday, 28th March is silent on the iPlayer?
Or was this another 'extremely unlikely' event?
Complain about this comment (Comment number 18)
Comment number 19.
At 14:44 30th Mar 2011, Judy wrote:Would these problems have anything to do with the fact that at 7.30 Tuesday evening any recording we tried to set up showed as 'Sunday 27th March'? If we linked our recorder to BBC 1 that was the default date, whereas BBC 2 showed the date and time correctly.
Complain about this comment (Comment number 19)
Comment number 20.
At 15:13 30th Mar 2011, neil wrote:In Reply to : "announcement of 'routes' onto the internet that allows BBC Online to be 'found'." What is that? how about in interweb speak, does it mean DNS? I fear the BBC is heading for 'series of tubes' moment (google it) "
This is a common term. Annoucing routes is used globally by members of the technical community , as noted in the blog for the "technically minded".
To extend upon the blog posting , annoucing / advertising routes have a look at : http://en.wikipedia.org/wiki/Border_Gateway_Protocol
Human error is usually a common fault with BGP as in the case of Facebook being blocked in many middle eastern parts occassionally, or when AT&T traffic ended up being routed via china , sometimes naughty people attempt to poison and disrupt though throgh publishing incorrect routes into the upstream networks.
Seeing a couple of IP address's for DNS servers doesn't mean they only have 4. IP's can be BGP Anycasted to anywhere in the world. eg Google's DNS server , 8.8.8.8 translates into many address's hidden behind the gateways.
http://en.wikipedia.org/wiki/Anycast
If you'd like to see further google for looking glass servers , or eg http://www.bgp4.as/looking-glasses .
Complain about this comment (Comment number 20)
Comment number 21.
At 17:49 30th Mar 2011, Verbal Spillage wrote:Just to point out, the hyperlink within "Topical posts on this blog" stills states "Tuesday 23rd 2011"..
Complain about this comment (Comment number 21)
Comment number 22.
At 18:29 30th Mar 2011, Nick Reynolds wrote:Verbal Spillage - this is a bug which we are aware of. Thanks.
Complain about this comment (Comment number 22)
Comment number 23.
At 18:57 30th Mar 2011, William Stevenson wrote:Oh dear!- he has almost written the two things which certainly indicate that such and such will certainly happen again:
'lessons have been learned' and 'robust systems are now in place to ensure that this does not happen again'. If you see that coming from the NHS, a recurrence in inevitable.
Complain about this comment (Comment number 23)
Comment number 24.
At 19:49 30th Mar 2011, Nick Reynolds wrote:William - Richard has not said either of these things. The NHS is off topic.
Thanks
Complain about this comment (Comment number 24)
Comment number 25.
At 00:20 31st Mar 2011, Paul wrote:Stuff happens. No one died, move on.
Complain about this comment (Comment number 25)
Comment number 26.
At 10:08 31st Mar 2011, OfficerDibble wrote:7. At 10:55am on 30th Mar 2011, Nick Reynolds wrote:
Thanks for your support Brendan. It was of course the 29th (i.e. last night) - my error which I am now endevouring to correct.
Apologies
Your error? So who wrote this blog? What part did you play in this blog?
Complain about this comment (Comment number 26)
Comment number 27.
At 11:28 31st Mar 2011, Nick Reynolds wrote:OfficerDibble - as the blog editor I am currently doing all the work required to input finished copy and publish it. The text of the blog was written by Richard. I put in the wrong date in the title by accident (as I've already explained).
Thanks
Complain about this comment (Comment number 27)
Comment number 28.
At 13:30 31st Mar 2011, Guy wrote:Thanks Mo (#17). The glitches on iPlayer seem to be sorted out but it looks like there has been (another) revamp in design - the change looks v permanent - even though I haven't seen anything said or blogged about it.
A pity as I don't find it as good as it was. Not only is it more clunky but also more black and white and info when playing, especially if buffering, doesn't stand out so much. Better as it was with playing and info below programme screen rather above (IMHO).
Complain about this comment (Comment number 28)
Comment number 29.
At 15:36 31st Mar 2011, PatagoniaSky wrote:Synchronium: I can absolutely guarantee that there are people getting a massive Malcolm Tucker style bollocking over this! Having worked at the BBC and Siemens for some years in the past, I know this wouldnt have been taken lightly. Rest assured that some people will have been dragged over some serious coals for this - for sure. Wont be been pretty. In fact, the aftermath will still be going on for at least another week from now I imagine.
Complain about this comment (Comment number 29)
Comment number 30.
At 13:55 1st Apr 2011, massroids wrote:This comment was removed because the moderators found it broke the house rules. Explain.
Complain about this comment (Comment number 30)
Comment number 31.
At 18:58 1st Apr 2011, philocleanthes wrote:On the bright side, since both went down this time then epic work to overturn the outrages should mean that there will be less chance of one or both happening again.
Complain about this comment (Comment number 31)
Comment number 32.
At 21:20 1st Apr 2011, David Maginnis wrote:I know the BBC is making cuts, but don't u think u should have a backup team of hamsters for when the first team hamsters get tired out? :)
Complain about this comment (Comment number 32)
Comment number 33.
At 15:14 2nd Apr 2011, John_from_Hendon wrote:On the Newsworthiness of the BBC's outage....
I am a little concerned that I failed to hear any mention of the BBC's internet being down on any of then BBC's news services on TV or Radio.
Was the because the BBC did not think it was newsworthy or that a tragedy at home is best hushed up? The twitterati went into overdrive with conspiracy theories which could have been scotched if only the BBC radio or TV news had made mention of the domestic incident in a timely manner. The lesson is:- it is better to suss-up quickly as this reduced to collateral damage.
Complain about this comment (Comment number 33)
Comment number 34.
At 15:23 3rd Apr 2011, Ian wrote:Huh, just an hour, I'm in Thailand my local online English newpaper has been on the blink for over a week. You guys should try living in the 3rd world for a bit:D
Complain about this comment (Comment number 34)
Comment number 35.
At 12:50 5th Apr 2011, Debbie Rockford wrote:Richard, do you know if the outage affected the commenting system as well... seems to be hanging on occasion despite trying the BBC site on a number of different computers/ browsers? Hopefully this goes through!
Complain about this comment (Comment number 35)
Comment number 36.
At 14:10 12th Apr 2011, sohi wrote:what a shambles the BBC news is still off line, most annoying as @ 14.10hrs Tuesday 12th April
Complain about this comment (Comment number 36)
Comment number 37.
At 20:42 26th Apr 2011, isolinx wrote:I visited this page first time and found it Very Good Job of acknowledgment and a marvelous source of info.........Thanks Admin!
[Unsuitable/Broken URL removed by Moderator]
Complain about this comment (Comment number 37)
Comment number 38.
At 10:54 23rd May 2011, Truthadvocate wrote:Unbelievable!? Is this the best you can come out with to earn our millions in TV licence? The BBC does not have the most intelligent on its staff, they are out here watching reading and listening. Unless your purchase department is corrupt, any credible IT supplier and designer would have insured that "THE BBC" was well backed up to do the job they did for almost a century. I suspect the BBC needed to get rid of much unbearable criticism from their audience and its handling of news and politics. Shake up! Clean your act!
Complain about this comment (Comment number 38)