Sunday, November 25, 2007

A Universal Library Online?

In a November 6 post, Norm Geras quoted this rather alarming passage:

Nobody really knows how much information there is in the world. According to one extremely rough estimate, if you took every book, newspaper, magazine, TV and radio programme, every music album, every handwritten letter, every filed-away document and every other piece of recorded data in existence, and you stored them all on computer hard drives, the amount of disk space you would need would be somewhere in the region of 2,100 exabytes, or 2,100bn gigabytes. If it helps - and it probably doesn't - this is more than 100m times the amount of data that is thought to be stored, in print form, in the bookstacks of the world's largest library, the Library of Congress in Washington.

The explosion of information in the digital age has fostered hopes that the Internet can become a sort of universal library where all of this data can be organized and preserved for the benefit of future generations. Anthony Grafton explored this notion in an interesting essay for the November 5 New Yorker:

It’s an old and reassuring story: bookish boy or girl enters the cool, dark library and discovers loneliness and freedom. For the past ten years or so, however, the cities of the book have been anything but quiet. The computer and the Internet have transformed reading more dramatically than any technology since the printing press, and for the past five years Google has been at work on an ambitious project, Google Book Search. Google’s self-described aim is to “build a comprehensive index of all the books in the world,” one that would enable readers to search the list of books it contains and to see full texts of those not covered by copyright. Google collaborates with publishers, called Google Publishing Partners—there are more than ten thousand of them around the world—to provide information about books that are still copyright protected, including text samples, to all users of the Web. A second enterprise, the Google Library Project, is digitizing as many books as possible, in collaboration with great libraries in the U.S. and abroad. Among them is Kazin’s beloved New York Public Library, where more than a million books are being scanned.

Google’s projects, together with rival initiatives by Microsoft and Amazon, have elicited millenarian prophecies about the possibilities of digitized knowledge and the end of the book as we know it. Last year, Kevin Kelly, the self-styled “senior maverick” of Wired, predicted, in a piece in the Times, that “all the books in the world” would “become a single liquid fabric of interconnected words and ideas.” The user of the electronic library would be able to bring together “all texts—past and present, multilingual—on a particular subject,” and, by doing so, gain “a clearer sense of what we as a civilization, a species, do know and don’t know.” Others have evoked even more utopian prospects, such as a universal archive that will contain not only all books and articles but all documents anywhere—the basis for a total history of the human race.

I am extremely skeptical of such predictions. The idea of creating a universal library encompassing all the world's knowledge has been around since the creation of the Alexandrian library. It is just as impractical now as it was then. As noted above, the same technology that has enabled more information to be stored than ever before also allows more to be produced than ever before. Essentially, we are looking at a cycle in which the ability to produce information will almost always exceed the ability to archive all of it in a comprehensive way.

Grafton shares my skepticism about the "universal library", making the case more eloquently than I can:

In fact, the Internet will not bring us a universal library, much less an encyclopedic record of human experience. None of the firms now engaged in digitization projects claim that it will create anything of the kind. The hype and rhetoric make it hard to grasp what Google and Microsoft and their partner libraries are actually doing. We have clearly reached a new point in the history of text production. On many fronts, traditional periodicals and books are making way for blogs and other electronic formats. But magazines and books still sell a lot of copies. The rush to digitize the written record is one of a number of critical moments in the long saga of our drive to accumulate, store, and retrieve information efficiently. It will result not in the infotopia that the prophets conjure up but in one in a long series of new information ecologies, all of them challenging, in which readers, writers, and producers of text have learned to survive.


The supposed universal library, then, will be not a seamless mass of books, easily linked and studied together, but a patchwork of interfaces and databases, some open to anyone with a computer and WiFi, others closed to those without access or money. The real challenge now is how to chart the tectonic plates of information that are crashing into one another and then to learn to navigate the new landscapes they are creating. Over time, as more of this material emerges from copyright protection, we’ll be able to learn things about our culture that we could never have known previously. Soon, the present will become overwhelmingly accessible, but a great deal of older material may never coalesce into a single database. Neither Google nor anyone else will fuse the proprietary databases of early books and the local systems created by individual archives into one accessible store of information. Though the distant past will be more available, in a technical sense, than ever before, once it is captured and preserved as a vast, disjointed mosaic it may recede ever more rapidly from our collective attention.

Grafton's essay is a fascinating read, and well worth reading. He briefly analyzes the history of efforts to preserve information and makes the important point that libraries and archives will continue to be needed as storehouses of physical information. This is due both to the need to study books and documents as physical artifacts, and because the number of volumes and pages that would have to be digitized to make "everything" available online is overwhelming.


Post a Comment

<< Home