BBC BLOGS - dot.Rory
« Previous | Main | Next »

A librarian takes on Google Books

Rory Cellan-Jones | 09:50 UK time, Monday, 7 June 2010

What's the point of a library or a librarian in the digital era? Who needs a physical space for books and archives, and librarians to police their use, when all that material will soon be available to anyone with a decent internet connection at the click of a mouse?

National Library of Wales in AberystwythOn a visit to the National Library of Wales in Aberystwyth last week, I got a few answers to those questions. This is one of the UK's copyright libraries, able to ask for a copy of every book, newspaper and magazine published in the UK. It is housed in an imposing 1930s building perched above the west-Wales seaside town, and is home to collections of paintings, films and television programmes as well as as ancient and modern Welsh texts.

Like other such institutions, it is struggling to find a role in the digital age. The numbers turning up to visit the library and use its facilities are falling, and while millions are coming to its website, there's a sense that these are casual passers-by, and the value they get from the site is hard to measure.

The National Library is not however just sitting back and waiting for a graceful demise. It's plunging with enthusiasm into a massive digital project which could give it a sustainable future. A third of the staff now have roles in this project, so there's now a range of perhaps unfamiliar job titles, from Imaging Officer to Metadata Manager.

ScanningWhat they are engaged on is extremely ambitious, as Andrew Green, the Welsh-speaking Yorkshireman who runs the place, explained to me as we toured the building. The idea is to give free online public access to as much as possible of what he describes as "the printed heritage of Wales" from the 16th Century to the present day. So every book, periodical or pamphlet could end up online.

In a large room at the top of the library, I got a glimpse of the extent and the cost of this task. An imaging officer - I think that was her title - was carefully scanning page after page of a bound edition of a 19th-Century Welsh newspaper. The scanner used for this task was imported from Germany at a cost of £80,000, but the process doesn't end there.

NewspaperThe next stage involves OCR - optical character recognition - to turn the scan into machine-readable text. Then the output needs to be proof-read, and here there's the possibility of repeating a crowd-sourcing experiment used by an Australian library, which got the public to proof-read scanned texts, and found many people competing to do the most edits.

The National Library of Wales is also involved in a project which helps community groups - schools, sports clubs, even families - scan their archives and make them available online as a kind of living history document. The end results of this massive digitisation project could be an invaluable resource for historians, and for anyone interested in the culture of their country. But this will not happen in a hurry, partly because the money has to come from increasingly scarce public funds, but also because much of the work involves cutting through the complex thickets of copyright.

There is an elephant in the room for this and other mass digitisation projects by libraries around the world, and it is called Google Books. The arguments about the search giant's plan to make millions of out-of-print books from around the world available online are too complex to go into in this post; suffice it to say the project is far more ambitious than anything that could be carried out by a individual public-sector institution.

Library WalesSo why does not Andrew Green just hand over his digital plans to Google, and let a commercial company bear the cost rather than the public purse? The librarian puts powerfully his case against the privatisation of our printed heritage. "The people of Wales own this collection, they have paid to build it up over the years, why should it just be handed to Google?"

He points out that commercial companies - even those as powerful as Google - can come and go, while the National Library of Wales is likely to be around in the 22nd Century.

The librarian believes he has found a new cause for his profession, to give a secure home to digitised texts produced with the highest quality standards and available freely to all. "These are huge benefits," he says, "and should be fought for by all of those who care about unimpeded public access to knowledge." Google beware - the librarians are getting cross, and they are quiet but patient people.


  • Comment number 1.

    The classic blog post outlining why librarians are dissatisfied with Google Books was this one:

    Google Books: A Metadata Train Wreck

    Basically, GB is good at what it does, but it's not a library in the sense that its contents are not really organised or findable by internationally agreed standards.

    Also, library books are publically and "freely" available, whereas most of GB is either out of copyright (so you could download it from other sites anyhow) or only available as a preview (so you must buy the book anyhow to browse freely).

    Until search engines read like humans (or vice versa, which looks increasingly possible!) we will need some human intervention in our research and even "leisure" reading.

  • Comment number 2.

    When doing research, I find the library an invaluable source of reliable and correct material. Searching on the web produces a mass of conflicting results.

  • Comment number 3.

    Shouldn't the imaging officer in the above photo be wearing gloves to protect the paper from acid and any other contaminants...

  • Comment number 4.

    " 3. At 11:19am on 07 Jun 2010, Stephen Hill wrote:
    Shouldn't the imaging officer in the above photo be wearing gloves to protect the paper from acid and any other contaminants..."

    I was thinking that when watching David Dimbleby's "Seven Ages of Britain" when he was handling all these old documents and books without any noticeable precautions.

  • Comment number 5.

    I've recently been researching a paper on the history of infrared photography and found Google books to be invaluable. But it is still patchy and appears to concentrate on American periodicals.

    As always the best thing is to use as many resources as you can. The Times online archive, which is text searchable thanks to a very good OCR, provides a model for any newspaper and is probably free to use through your local library. The SAO/NASA Astrophysics Data System (ADS), is a model for journals. The Royal Society currently has its papers online as part of its anniversary celebrations and I hope they will continue to make them available. In my case it made it easy to read Herschel's description of how he discovered infrared radiation in 1800 in his own words and in full.

    Sometimes there is nothing quite like browsing through the bound journals themselves, because the more you research something, the more you know how to make the connections.

    Finally, there is the librarians/archivists themselves. I got a lot of my information from archives on the other side of the world solely thanks to their interest, enthusiasm and knowledge.

  • Comment number 6.

    "I think that was her title"

    Y'know, there's this brilliant bit of, admittedly non-digital, technology called a notebook and pencil. Journalists used to have them, so they could "write down" information to save effort in "remembering" it later, through a reverse algorithm known as "reading."

  • Comment number 7.

    Stephen Hill, cerebros, and anyone who is rightly concerned about protecting paper: the simplest way to do so is to wash your hands!

    Many librarians and archivists use bare hands because gloves reduce sensitivity as well as collecting dirt themselves. The British Library has a good explanation of their practice: [Unsuitable/Broken URL removed by Moderator]

  • Comment number 8.

    #6 Ian Walker


  • Comment number 9.

    With all due respect to the librarian profession, they are like many other struggling to adapt to the digital future. Surely having all books available to all people all the time is the way of the future and they should start to accept that the concept of the physical library is getting obsolete.

  • Comment number 10.

    Despite the advances in digital books, it is still much easier and less straining to actually read from a physical book, particularly when revising. At university, the library is more than a collection of books and papers, it is an area to work in peace, with fewer distractions, as well as providing spaces for group work. Plus it is far easier to grab a book off the shelf than to try and Google the topic, and just get a string of Wikipedia articles and course module descriptions from various universities. Perhaps the internet needs a Dewey Decimal system when it comes to finding 'real' information.

    Somehow I don't see the library disappearing any time soon.

  • Comment number 11.

    @Andy Finney
    Spot on, using the widest range of sources possible and you can't go wrong as a historian.
    "Finally, there is the librarians/archivists themselves. I got a lot of my information from archives on the other side of the world solely thanks to their interest, enthusiasm and knowledge."
    So true, great people.

  • Comment number 12.

    With some 2000 books on my shelves, finding a half-remembered reference is difficult. Recently I discovered that GB had one of my modern books in digital searchable format. This gives me a combination that has the handling benefits of a physical book - but with a comprehensive online index.

    Even better - a GB search will often throw up a book title that has an unsuspected association. Unfortunately such books are not always available to buy physically - or are priced as speculative collector rare objects rather than for their information value.

    The time has come to separate information from the medium when it comes to availability and pricing. The author could then get a fee that they currently miss when an original physical book is re-sold on the secondhand or remainder markets.

  • Comment number 13.

    If there had been digital storage in the 16th century do you really think that you would be able to read the data now? The National Library of Wales may well be around in the 22nd Century with well preserved physical archives but for current digital archives to work there will have to be a rolling programme of updating to future standards and this is an ever ongoing cost.
    If we rely on digital storage for the future (and this includes media being created now in digital environments) then there is a very real risk that we are in a new dark ages as nothing will be left in a usable form in just a few decades.

  • Comment number 14.

    One answer is that a book is a valuable aid to concentration. Most new developments, especially computerised ones, steer us further and further into a world of the 'non-stop distraction' that Aldous Huxley warned about in his: 'Propaganda in a Democratic Society'.

    Such distraction discourages thought. Consequently, personal judgement is replaced by the swallowing of propaganda, much loved by the advertising world, which of course is unremittingly being tailored to carefully researched individual desires.

    Books can certainly be misleading, but the act of reading them on the page, with the easy ability to switch back to re-read something carefully, does encourage the thought process.

    Reading a book on screen in digitised form has three additional disadvantages: first, the close proximity of other digital applications which the reader can easily switch to (distraction); second, the likelihood that intrusive advertising will soon appear within the digital 'pages' or even the text', and third, a dumbing-down process due to works being available in far too narrow a range of versions.

  • Comment number 15.

    I worked in a library for 10 years up until last year. During that time it became increasingly obvious that the concept of a physical library no longer fulfilled the market needs, it also became obvious that librarians were to entrenched in the concept of a physical library to make the necessary changes fast enough to compete with electronic information providers. I suspect there are free thinking librarians all over the UK who realise that the current librray model has to change but who are also frustrated that they can't push the necessary paradigm shift past the old guard fast enough to compete with companys like Google.
    Don't get me wrong, librarys are changing, are focusing on the digital age, but they are reluctant to make the huge changes necessary to make them competative.
    The future of library's is free, online access to all the information in the world. It's inevitable. Librarys have about 5 years to get on board whole heartedly with that concept or sink with the rest of old media.

  • Comment number 16.

    The first paragraph shows a basic misunderstanding of what Librarians do anyway. Online resources are a massive part of our jobs.
    Yes lots of things are available at the click of a mouse. But who organises access to these services? Who maintains the list of the instutitions IP addresses, Shibboleth Federated access, who maintains the contracts and makes deals for reduced costs. Oh and who does the budgeting for these materials? Online is not free and resources don't just appear out of nowhere, Librarians do all these things and more as for print and online resources as well as dealing with normal day to day requests.
    And even with all these resources at your fingertips it's amazing when you have professors and people with years of experience and knowledge who can't even work out how to find a link on the first page of a Google search let alone do a citation search in Web of Knowledge.

  • Comment number 17.

    We move ever closer to making true the episode of futuristic "Logan's Run" which foresaw a post-Armageddon (Sanctuary) which ultimately proved to have all the answers in the form of pre-apocalypse books - but, with no publishing industry (back to Iron Age hunting and gathering), the people had failed to see any value in teaching the art of reading, and the library had become merely an object of quasi-religious veneration.

    If Google Books gains any REAL power, we may as well start burning the books now.

  • Comment number 18.

    In 100 years time, all current digital formats will be long obsolete and probably unreadable. Books will still be in perfect condition if they've been looked after properly by librarians. Physical media are intrinsically superior for archiving purposes.

  • Comment number 19.

    One important factor of the National Library of Wales's digitization project is the fact that most of the material is Welsh-language material. If you leave it to Google and other private firms they might concentrate all their efforts on stuff published in major languages such as English, Spanish and French etc... because you'll never make money out of minority language projects, so even if Google were given the chance to work on Welsh-language archive material they'll never invest heavily in the project as their returns will never mach investment in major languages.

  • Comment number 20.

    The problem of document standards and file formats is a real one. A recent article detailed problems the US navy has with nuclear aircraft carrier maintenance because modern software doesn't render their maintenance diagrams correctly. Similarly lots of WP documents created as little as 10 years ago use file formats which are no longer reliable.
    The probable danger of services like Google Books is that they might be just good enough to undermine traditional libraries but not good enough to serve all our future needs.
    There is also the danger of loosing the specifically human skills of those gifted but often eccentric librarians who have encyclopaedic knowledge of specific subject areas in a way even Google cannot turn into a computer algorithm.

  • Comment number 21.

    Good article, interesting.
    But I couldn't help thinking that if we become totally reliant on Internet availability of books, where would be be if the lights went out?
    We need hard copies in case. I mean we don't want to be thrown back into the dark ages if modern technology fails, or if (God forbid) a WMD takes modern technology out.
    Also, there are still huge pockets of the world where IT doesn't reach (even in the UK) and where books are worth their weight...
    Lastly, I never liked turning over digitization of books to Google; this is not about making a buck and commercialization; this is about freedom of access to learning, sharing knowledge, and other good things for all people - rich and poor alike.

  • Comment number 22.

    Even if every single person in the country scanned one book and placed it online, it would still take decades for everything that's ever been published to be digitised. Libraries will be around for a long time yet.

    And in any case, are you proposing that we just burn the actual books after they're scanned? We'd still need somewhere to store them. Maybe people won't need to visit their library and actually borrow books any more, but as it's a free service anyway, then it's not as if that's going to impact greatly on libraries.

    Finally, the danger in allowing a commercial company to control the access to these books is that eventually they'd charge for the service, and simply become a print-on-demand supplier of out-of-print books. And once they've got all the books, there would be nothing we could do about it.

  • Comment number 23.

    There are already well-established accademic digitalisation projects available. I used them a fair bit as a history student: if you don't have immediate access to your own institution's library, or, especially in the case of early books and back copies of slightly obscure periodicals (through JSTOR), it doesn't have them, they are quite useful. Most of the mjaor journal publishers have online subscriptions for their latest editions.

    The other benefit with digitalising collections is that less frequently consulted works and fragile books can be stored appropriately off site if needed, whilst still being available to readers. This is particularly useful for copyright libraries, who might run out of space.

  • Comment number 24.

    Google books are quite often cannot be searched for keywords which, as far as I am concerned, renders the exercise useless. And the fact that this is a commercial organisation makes me most suspicious of their motives. Profit-seeking and altruism usually go together like hens and motorbikes!

    But there is a wider issue too, and that is the privatisation of the digitisation of our national heritage. The State Papers relating to the history of these islands have been digitised. But they are only available to researchers through institutions who are charged a hefty annual fee. Unsurprisingly, universities find more pressing priorities for their cash, and since the controlling company, Cengage Learning CMEA, refuses to grant access to individuals, those records are effectively unavailable. Once can go to Kew and consult them directly there for free, but for someone like myself in Northern Ireland, the costs of flights and hotels to spend time there on research is prohibitive.

    Commercialism of cultural property is proceeding apace - to the detriment of all of us.


  • Comment number 25.

    I have had the pleasure of being a Reference Librarian in a college for about 25 years. Before that, I spent many years cataloging and helping in public libraries. From working with Faculty and students, I know that one form of material will never fill the need of the patron. Online sources have made research easier in some cases and more difficult in others. It is true that we are now dependent on electricity and on the ability to evolve technology to store information. I have spent as long a 5 months researching a faculty question, using online sources, print, the telephone,email and "snail-mail". This is the most facinating profession in the world and I would not trade it for any other. Spend some time with your local librarians and remember to take your children along - they need to know resources way beyond Goggle.

  • Comment number 26.

    I'm an author and I want my books to remain the way they were made - on paper; not the property of some American machine.

  • Comment number 27.

    I think there is also a legitimate concern about the authenticity of information. Its one thing to hold an old periodical or book in your hand, but to trust a version that has been scanned and OCR'd may be dangerous not so much due to technical flaws but for the ability to intercept this data and block it, censor it or even change it to suit a purpose. It simply makes a 1984 scenario so much more possible.

  • Comment number 28.

    How quickly the importance of the written word in the form of books and papers is forgotten. Seems we are moving into "Farenheit 451" much quicker than Ray Bradbury imagined. My librarian is still more important to me than the mayor or the governor. My librarian performs more services for me than google or any of the other internet book services can ever perform. Do we need librarians? As much as the air we breath and the water we drink. They help to stimulate the mind by knowing what it is we are looking for. I am very computer savvy but still use my library as the ultimate resource. My librarian understands what I am looking for better than google. Google directs me to where google wants me, not necessarily what I am looking for. Dollars drive google, the willingness to know and help drives the librarian.

  • Comment number 29.

    I find it so refreshing that there is this feeling in the air that google shouldn't be allowed to just do what they want for all the reasons mentioned here. Now if only we can get this message through the heads of the rest of the world we may start getting somewhere.

    We need to find the balance between old and new media, as nothing will ever compare to actually holding a physical copy of a book. If this is after weeks of actually trying to get hold of it, then this is a even better moment. As people above have pointed out, the digital copy isn't the same as there are always just to many distractions.

  • Comment number 30.

    I think that the library in this case is taking the role of custodian of artefacts that are in the national interest of identity .

    A library has the duty and care to provide the aritfacts as points of physical evidence that events did take place, where and when, and with the empirical evidence that the documents can be dated and verified to being witnesses to the events that transpired.

    However the pure medium of text is as transient and nascent as it has ever been or will be in the foreseeable future and should not be location specific. Text is a living and changeable commodity that not only provides insights into so called "historical facts" but also the now and future.

  • Comment number 31.

    Hurrah! Google Books (and a number of other Google sites) is back. It appears that they were blocked by the Turkish government these past few days - Turkey BLOCKS Google Services Indefinitely (Huffington Post / Hurriyet) (not to mention slowing down the Internet - thanks to all the sites which use Google Analytics).

    Which makes me very glad that I do not rely on Google Books (or any other Internet site) for my library.

  • Comment number 32.

    The fundamental issues here are the interests of both parties and who to trust.
    The National Libary of Wales will act in the public interest whereas Google will act in its own interest. There's nothing wrong with that per se, but it does mean we can't rely on Google to act in the public interest.
    Secondly, who to trust. I trust the National Library of Wales implicity and don't trust Google very much at all. I imagine I'm not alone.
    Each body has its own agenda and aims. Let's not lose sight of that.

  • Comment number 33.

    One important factor of the National Library of Wales's digitization project is the fact that most of the material is Welsh-language material. If you leave it to Google and other private firms they might concentrate all their efforts on stuff published in major languages such as English, Spanish and French etc............ because you'll never make money out of minority language projects, so even if Google were given the chance to work on Welsh-language archive material they'll never invest heavily in the project as their returns will never mach investment in major languages.

  • Comment number 34.

    One important factor of the National Library of Wales's digitization project is the fact that most of the material is Welsh-language material. If you leave it to Google and other private firms they might concentrate all their efforts on stuff published in major languages such as English, Spanish and French etc... because you'll never make money out of minority language projects, so even if Google were given the chance to work on Welsh-language archive material they'll never invest heavily in the project as their returns will never mach investment in major languages.

  • Comment number 35.

    This comment was removed because the moderators found it broke the house rules. Explain.

  • Comment number 36.

    All this user's posts have been removed.Why?


BBC © 2014 The BBC is not responsible for the content of external sites. Read more.

This page is best viewed in an up-to-date web browser with style sheets (CSS) enabled. While you will be able to view the content of this page in your current browser, you will not be able to get the full visual experience. Please consider upgrading your browser software or enabling style sheets (CSS) if you are able to do so.