Loss is not where you find it
Blue Screen of Death- the result of data loss? CC Image from Flickr User Justin Marty
There is a plot: system complexity conspiring to make data inaccessible. It was no coincidence, I am sure, that my first complete disc failure in 17 years come withn two months of the conversion of my laptop's hard drive to full encryption. Lost laptops and compromised personal details are a national problem. The contents of my own laptop would bore anyone else silly, but I'm sure there are all sorts of laptops carrying private and confidential details that deserve full protection.
I just hope that the new encryption systems that now sits on millions of UK hard drives do indeed give protection, because compromise of data is only one risk. Few of us have data whose loss would compromise national security or even embarrass our employers, but all of us have data that we would hate to lose - as I found out at Denver airport when I was about two start two weeks' work in the USA, and had nothing to work with.
I also hope the UK's major (or not so major) IT departments are collecting statistics on computer failures where encryption is implicated. I know in my case they did not, as my efforts to find a way to diagnose the problem while 8000 miles from base led to various changes, and so my dead laptop is logged as 'system rebuild required', not as 'death by encryption'. It is only through statistics that we can understand incidence of failures and their types, and thereby understand the real risks posed by digital technology. Without knowledge of the risks, we can only speculate about where to place our collective efforts - and budgets - that fall in the general area of 'digital preservation'.
Of course, my dead laptop is but one data point, as I am myself, as to that. But I have more: several times recently I've come across major examples of 'loss of data' - and as with my hard drive, it wasn't the data itself that was lost, it was the complexity above the data that got its knickers twisted and so ceased to function.
- The EBU has a panel, P/DATA, that is looking at asset management systems. A senior engineer of one of the first such systems to specialise in broadcasting came to Geneva in July, to talk about how a broadcaster 'going tapeless' should go about moving into digital asset management. He mentioned entire collections of online content disappearing, owing to corruption of the database - because an asset management system is 'just' a lot of files, and a database with information about those files. He'd personally experienced that situation twice, and in each case there were backups to revert to, to rebuild the collection and get back into business - after up to two weeks of travail.
- Our BBC collaborative project PrestoPRIME had a workshop at the beginning of October, where we asked about examples of loss (because I'm trying to collect evidence in order to establish risk - that's what I do). Again, a fully competent IT company working with a major Spanish broadcaster described a database corruption, of an asset management system, and in that case 80% of the material was recovered from backups (taking several days) - and 20% had to be re-ingested, which took several weeks (part time).
- All of which should remind us of the BBC's online picture store Elvis, which crashed some years ago and again it was essentially database corruption, compounded by backups that largely failed to work. There was one very effective backup - Lisa, daughter of Elvis - but that held the video-quality scans and not the full-quality scans (as needed by Radio Times and all the other BBC print publication that Elvis also supports). Something like 100k high-resolution images were lost.
There is a common thread so far - the bits still exist, unaltered, on storage media - but the complexity sitting between the user and the bits has 'ceased to be' in some fashion, and so the whole thing is a dead parrot (and called a storage failure, though it is anything but).
This thread leads to an even greater problems: systems that haven't crashed, but still won't find things, because they are in some way inadequate. We're all now aware of metadata and its purposes, but - just as with data itself - there has to be effective technology using the metadata, or again the results is a 'digital dead parrot'.
You may not know that I'm a prize-winning poet. I was somewhat surprised to learn this myself, but indeed my entry into a competition in Ariel, the BBC's in house paper, won second prize, and they were so pleased that they asked me to make a podcast. As with many print publications, this one also has an online version, where it sticks extras, like my podcast. The problem is, there is no search engine on the online version, or indeed ANY other search technology. There's a list or recent or popular pages, but once content falls off that list, it falls away completely, and becomes as inaccessible as the data on my dead hard drive. As with the hard drive, the data is still there, but inaccessible. The PDF's of the print version are indexed by a BBC search engine, but the online pages are not. The result is an inaccessible poem: even the person who posted my podcast can no longer find it!
An internal BBC publication is a tiny issue compared to bbc.co.uk itself, the BBC's world-class media website. BBC policy is to hold the text from bbc.co.uk in a sort-of archive, but reasons of space/budget/complexity mean that the audio and video content on bbc.co.uk is not archived. The justification is: all that audio and video goes out on radio and TV, and so gets archived separately. Has the validity of that statement been checked? How much audiovisual content is NOT also broadcast? I wish I knew! The business case to build a real archive (something with comprehensive capture, and access) was chopped and chopped until it was reduced entirely to a 90-day legal requirements system, with just a couple of access points. Meanwhile, anybody who does want to see BBC content that has been taken down from bbc.co.uk has to go to Internet Archive, where they do monthly (or thereabouts) scans of the entire internet, and make it available to all through their Wayback machine.
So there are a half-dozen examples, ranging from my laptop to bbc.co.uk, where data can no longer be found because, essentially, of failure or inadequacy of the system sitting between the user and the data. The robust solution to failure is to simplify that technology layer - and unfortunately IT systems are moving in the opposite direction. I fully expect an epidemic of data loss, in direct consequence of the mass installation of encryption on company hard drives. I hope I'm wrong.