Rory Cellan-Jones

The Sidekick Cloud Disaster

  • Rory Cellan-Jones
  • 13 Oct 09, 09:48 GMT

A year ago, I visited a giant data centre belonging to Microsoft a hundred miles or so north of Seattle in Washington State.

It was an impressive sight: room after room of servers cooled by by electricity which arrived from a nearby hydroelectric scheme, plus two sources of back-up power if the mains connection should somehow fail. All of this was just part of Microsoft's substantial global investment in cloud computing, to ready itself for a future where we'd all keep more and more of our data online in secure locations like this.

But I had a question.

What if, due to some unforeseeable chain of circumstances, the whole place went up in smoke - taking our valuable data with it? Back they came straight away with an answer - "redundancy". No, no, not mass sackings amongst Microsoft employees responsible for data loss, but backup systems.

In other words, every single piece of data stored in the Washington data centre would also be held elsewhere, just in case. It all seemed pretty satisfactory to me - and indeed I've put more and more of my own data into "the cloud", from Google Documents, to web-based e-mail, to photo libraries stored on Facebook.

Now comes the Sidekick story. Users of a popular mobile phone on the American T-Mobile network have lost some of their data, and the apparent cause is a server failure.

It's being called the biggest disaster yet for the whole concept of cloud computing. The software and the services for the Sidekick phone are designed by a company called Danger, which helps users store their contacts, photos and all sorts of other personal data in the cloud. But Danger was bought last year by - guess who? - Microsoft, so the software behemoth is now going to cop a lot of the flak for this disaster.

The latest update on a T-Mobile forum says:

"T-Mobile and Microsoft/Danger continue to do all we can to recover and return any lost information. Recent efforts indicate the prospects of recovering some lost content may now be possible."

But it also talks of compensating people if they suffer "a significant and permanent loss of personal content" - which sounds pretty ominous.

What's not really clear is what happened to the famed redundancy of Microsoft's cloud operation. Reuters is quoting a statement from the company talking of "a confluence of errors from a server failure that hurt its main and backup databases supporting Sidekick users." But does that mean the backup databases were in the same place as the main ones?

If we're all to entrust our most valuable data to Microsoft's - or anyone else's - cloud, we're going to need to be sure that they tend it as if it were their own. If this kind of reassurance is not forthcoming, then all those forecasts of explosive growth in cloud computing will be, well, redundant.


  • Comment number 1.

    The issue isn’t the cloud per se, but the age-old of problem of putting all of your important information (even if fairly trivial, it still has personal value and takes time and effort to reconstruct) in one single place and having a synchronisation mechanism which deletes the local copy of if it goes away.

    You’re no smarter keeping your only copy of your address book and e-mail folders on your laptop than you are in Microsoft’s (or Google’s, or Yahoo!’s, or whoever’s) cloud.

    If it’s important, back it up. Quite often, cloud services are being used _as_ the backup for information held securely elsewhere (which makes a fair amount of sense: you consider the cloud version temporarily expendable if necessary).

    However, I do think that there’s a responsibility for mobile operators to make it easy and straightforward to customers to back up information from their devices—whether it’s synchronised to a hosted service or not—and to back it up in a device-agnostic fashion (it’s not like we don’t have fairly standard formats for most of the stuff which gets stored on phones, after all).

  • Comment number 2.

    I know the 'cloud' is the future, but I would rather keep my data & backups on my external hard-drives. I know where it is and the chances of all 3 hard-drives breaking at the same time is pretty remote.

    Good blog Rory

  • Comment number 3.

    Chris Robinson, referred to at said:
    There is no Sass without a rub …

    Sass = Software as a SECURE Service
    RUB = Relocatable user backup

    Unless you can get a full backup in an agnostic format of data (and preferably application too) which you can relocate to somewhere other than your service provider you are not secure. Period. You can get more details of Sass and RUB in Wikipedia or at the Webrecs website

  • Comment number 4.

    This is exactly why rolling out brand names like Google Heath, Microsoft Doctor or whatever is NOT a universal answer to solving IT in our own NHS.

  • Comment number 5.

    I will NEVER entrust anything to so-called cloud computing. It's been a disaster waiting to happen from day one.

    But this story does remind me to do my own backups more frequently. And to keep copies of important stuff at a friend's house. And not to use cheap CDs/DVDs for backups. And all the other simple logical steps one can take...

  • Comment number 6.

    In the last year or so we have progessively used 'the cloud' because it is easy, frees up space on the laptop, means that other people can access it and all in all we are delighted with it. However, if you put things that your business couldnt run without up there and have it no where else then quite frankly you are silly. There is no one and only totally secure for ever place, well not that i know of anyway. But the shift is towards cloud working and I for one am all for it.

    Currently using Google docs and even though we are a small 2 man team it is just perfect for us to comment on, adjust, delete and basically work side by side on the same document and then give individual customer access to as and when required. How else are we going to do that if we dont use the cloud?

  • Comment number 7.

    There is no real difference between 'the cloud' and any other normal network store - it's just a question of scale and accessability. At this scale the technology is in it's infancy and Microsoft and Google will be balancing the ability to make data 100% recoverable against the cost of doing so.
    In the end, every user should treat his/her data responsibly - should it be important then you should not rely on one place to back it up (even if that place is a cloud run by MS or Google). It is your data and is more important to you than anyone else. If you always assume that you may lose it, then you can judge how important it is to you and take the necessary steps to secure it elsewhere as well just in case.

  • Comment number 8.

    Rory, although Microsoft must be held accountable for this I would point out that the storage was not at their facility but was subcontracted to Hitachi Data Systems. It was their failure to make a back up (and Microsoft's to check they had) that caused this issue.

    The lastest update is that T-Mobile (the carrier operating the Sidekick) also believe they will now be able to restore at leats some of the data. For those for whom it cannot it will offer compensation.

    However, let's make no mistake - this is bad and a poor advert for cloud computing whoever offers it. It is sheer incompetence not to back up critical data before attemptign a migration or other change and I can only hope the industry learns from this.

  • Comment number 9.

    I have run server farms for a few years and have used a few different suppliers who always assure me there are "backups" for power failures, bandwidth failures and so forth. The trouble is these systems are not really testable. Sure, you can run a test and fix things - but there are lots of things that can go wrong after that test up until when a real failure occurs.

    So these failures really always happen no matter what the contract says - so you should always have your own local backups

  • Comment number 10.

    There's one thing I don't understand - why would it be down for a whole week? Even if the user data is unrecoverable, it doesn't take a whole week to get the servers back up and running.

    From what I've read, Danger's system was designed with an array of many high-availability redundant servers, so this means that EVERY server would have to have been taken out by whatever it was that happened.

    I also find it hard to believe that a company as big as Microsoft could be so lax with backups. They run Hotmail and Microsoft Online Services so it's not like they've never done this kind of thing before.

    It just doesn't add up.

  • Comment number 11.

    I work in business continuity so am pretty paranoid about my own data. All emails, files and pictures on my main computer and my two son's machines are backed up to an external hd and a smaller memory stick which leaves the house with me. I don't rely on the cloud for anything but I do sometimes forward emails to my Yahoo account so I've got a copy there too.

    A few years ago when all those online office suite providers started springing up, I bookmarked a few sites. Recently I was looking back through them - a few obvious ones are still going (Google Docs, Zoho) but quite a few sites no longer exist or the domain now belongs to someone else. Imagine if you had data on one of those and the plug is pulled.

    The cloud is not yet to be relied on technologically. The whole privacy & ownership issue is a debate for another day.

  • Comment number 12.

    Thanks for a good blog post Rory, but you don't highlight the real issue - COST!

    No IT system can be 100% guaranteed against failure. Professionals talk about RTO (recovery time objective = seconds of downtime acceptable in the design) and RPO (recovery point objective = maximum seconds period prior to a failure from which the design accepts data could be lost). Note use of the word "objective" not "guarantee" - any complex design can turn out to be flawed.

    It's common to turn RTO into "99.x% availability" because it sounds like failure is very unlikely, but that's less specific because it is an average over a time period and might represent more than one individual failure.

    Cloud service providers are trying to deliver at a low price. RTO = 0 and RPO = 0 is expensive, that's just a fact. NASA has such objectives for the IT supporting manned space missions, but they can afford it and they definitely wouldn't use cloud services to achieve it!

    Cloud providers claim they can offer low costs by exploiting their scale, but that's only part of the story. In fact, all have sacrificed both RTO and RPO. Example: GMail has published 99.9% application uptime (for commercial clients) - that's definitely not RTO=0, but nor does it tell you what their RTO actually is. You can be sure RPO is not zero either.

    We all need to be more aware of these weaknesses as our data - especially life-critical healthcare data - is increasingly put into IT services that might in turn be hosted in the cloud. There's not yet a substitute for writing something down and putting it in fireproof storage - think of the dead sea scrolls.

    See this article and the comments to it:

  • Comment number 13.

    I think all would agree this is a poor ad for the cloud. I guess the sidequick loss story raises a question of whether MS had integrated that data into the normal ms cloud set up or if it continued on its own independent cloud server which was not provided the resilience features that were shown to rory last year.
    I hesitate to use the cloud because I do not yet feel mobile internet is strong enough for me to guarantee access wherever I may go, which means I need local copy. Perhaps cloud for backup but not the main docs under use.
    The story also raise a data security question, not in the conventional sense of some unauthorised individual having access to data but that too much data in one place becomes target of rogue state or terrorism. Perhaps they have delt with the risk of a car bomb etc but I am reminded too much of Tom Clancy's story of NYSE data being lost through a deliberate attack on servers recording the data. Not through internet attack but a disreputable software developer installing an update to all servers.

  • Comment number 14.

    Good blog Rory. Far more tech related and better than Jobs@Disney...

    My main gripe with "cloud computing" is that I don't trust corporations to not sell my data on for a profit.

    And also there is the question of how secure it all is when it comes to hacking such backups to obtain personal details.

    These are two points that in my opinion people seriously need to consider instead of blindly adopting "cloud computing" as the way forward.

  • Comment number 15.

    #10 "I also find it hard to believe that a company as big as Microsoft could be so lax with backups."
    Never forget the old joke:
    Q: How many MS employees does it take to change a lightbulb?
    A: None. MS lightbulbs do not fail.
    But it's not fair to pick on MS. There are hundreds of other IT companies, large and small, who do no better.

  • Comment number 16.

    The Sidekick data was maintained on a completely separate system Microsoft inherited when they bought Danger. The rumor is that Microsoft's mobile management severely neglected the Sidekick in favor of its own preexisting projects and laid off, transferred or alienated much of the Danger staff leaving nobody left who understood how to run Danger's servers, but at any rate, their data wasn't kept on the Microsoft data centers you were shown.

  • Comment number 17.

    The last comment was very good. Business results (priority) rathan than competence was the problem, probably.

    And I, too, have experienced many websites that were seemingly here forever then gone forever...

  • Comment number 18.

    The last time I remember something like this happening was back in 2000 when Microsoft lost the contact lists of many MSN Messenger users.

    After that, they added the option to export and import contacts, which is still there today. Of course back then we didn't have the buzzword "cloud" (even though many "cloud" services existed) so it wasn't made out to be some catastrophic realignment of the computer industry as seems to be the case these days.

  • Comment number 19.

    What has this got to do with Cloud computing? Sounds to me like Danger had a dedicated infrastructure to run an application and that dedicated infrastucture broke. Media hype is putting the "cloud" tag on anything online. Cloud is not perfect but lets not blame it for things it has nothing to do with.

  • Comment number 20.

    Substitute the word "fog" for "cloud" and the whole concept starts to sound less appealing. It is also advisable to remember that the only person who has your best interests at heart is, er, you. Excessive reliance on others is seldom a good idea.

  • Comment number 21.

    BrianJohnHunt (11:49 Oct 13) is completely correct: cloud computing is simply an extension of the network. It is part of our future for the same reason data center consolidation/outsourcing is so popular: cost containment, regulatory compliance, and operational efficiencies. The fact that it has such a long pedigree (arguably just the latest name for ASP, SaaS, BSP, etc. etc) suggests it is a concept with enough appeal and dynamism to remain a factor in the foreseeable future.
    It would not at all surprise me if what contextfree (9:06 Oct 13) commented were true. I have personally seen several examples of technology orphaned by mergers and cost cutting produce results (on a smaller scale) exactly like the Sidekick fiasco.
    Other than some allusions I have not read any comments on what first occurred to me: the bogeyman single point of failure that is the backup process itself. All the hardware and hot sites in the world won't help you if the backup "tapes" are bad. If the people whacking resulting from the acquisition isn't the issue, my money is on a corrupt backup replicated to the recovery sites.

  • Comment number 22.

    This comment was removed because the moderators found it broke the house rules. Explain.

  • Comment number 23.

    The Sidekick has a peculiar & potentially dangerous way of storing personal data, the data in stored in some kind of cache that will be deleted when the phone is off. This in itself is suicidal.

    Blaming the cloud is not entirely fair. Any data that is worth any value to a user should be backed up in multiple places (online, local, on paper) and should be portable. As a user you a have a right to your data and should be allowed to export your data for safekeeping or use in another service.

    This requirement of not wanting to be locked to a single mobile operator or a phone maker led me to use Rseven. It's cross platform and allows me to sync data between my Windows Mobile Samsung i780 to my Nokia E75, both phones that I use on different networks.


The BBC is not responsible for the content of external internet sites