This short essay was written for my computing course at Birkbeck, Autumn 1996
The celebrated library at Alexandria in Egypt contained the greatest collection of literature in the ancient world. Founded at the end of the 4th Century B.C., it had the objective of collecting in one place a copy of every important Greek text ever written. Ever since, a compulsion to compile just such a comprehensive collection of literature has been a feature of every major civilized culture: the British Library, the Library of Congress, the Bibliothèque Nationale, all were founded with similar aims, and now maintain massive collections of books and journals.
The scale of such undertakings, even in times when comparatively few written works were produced, was vast: writ large in these endeavours are exactly the problems of storage, retrieval and security which modern data processing managers wrestle with. The history of the Alexandrian Library itself offers one of the earliest object lessons in the importance of following effective data security procedures: Plutarch records that in 47 B.C. part of the library caught fire, and many books were lost forever that might otherwise have survived.
Like their present-day counterparts, scholars and philologists through the ages have embraced new technological developments which appear to facilitate their Sisyphian task. In the early Christian era the codex, the bound book format which we are familiar with, superseded papyrus and parchment rolls, offering more effective storage and ease of access; in the 8th Century, minuscule cursive letter forms (using small characters, rather than just capitals) were perfected for both Latin and Greek alphabets, bringing further benefits of compression and speed of copying; and, of course, with the introduction of printing five centuries later, publishing on a truly industrial scale became possible, with massive implications for every aspect of communication.
The role computers can play in this area has expanded considerably in recent years . The use of computerised catalogues and indexes of existing collections has improved their accessibility, while, for publishers, digitisation has virtually eliminated the need for type to be physically set by hand. But now that fast, cheap random access storage space is available to every level of application, it is clear that machines can effectively and efficiently store not only information about books and journals, but entire texts as well: written works may be published and read by thousands, without using a single sheet of paper or drop of ink. It seems likely that these developments will prove to be an important milestone in the history of the “written” word.
With the digitisation of historical, literary and scientific texts, scholarly chores such as indexing and cross-referencing can be left to a computer, not to mention the chore of following all those cross-references. Huge texts can be scanned almost instantly for a particular phrase. Once a text has been digitised the phenomenal processing power of the computer can be let loose on it. To the studies of literature and history have come new approaches of quantitative textual analysis, where the attribution of a work or a date to a particular author may be substantiated by the analysis of the frequency and distribution of key words, phrases, or unusual spelling. For example, in 1992 a computer running a neural network program was trained to distinguish accurately between plays by Shakespeare and his contemporary John Fletcher that it had never seen before. The significance of this is that it may assist Shakespeare scholars in correctly attributing sections of the plays (The Two Noble Kinsmen and Henry VIII) on which the two men collaborated in 1613.[note]
It’s not by magic, of course, that texts come to be digitised. Although much that is produced now (this essay being no exception) is committed directly to computer, what went before was not. Now commercial interests - software houses like Microsoft, or traditional publishers like Oxford University Press - are committed to producung works old and new in an electronic form. But before them, one major electronic source of public domain literature has been Project Gutenberg. The project’s aim is to create a quality archive of platform independent electronic source texts. Volunteers have typed in many millions of words, including classic novels, poems, historical and scientific works, and optical character recognition (OCR) technology is increasingly used. It may seem a rather luddite enterprise, nevertheless it is important work. The texts produced may be made available for downloading via the Internet, or collected on CD-ROM. And, once entered, they need never be re-entered, re-typed, or re-set.
There is one further way in which computerisation can improve the effective use of textual material. Project Gutenberg’s texts are still largely in what its director, Professor Hart, refers to as “plain vanilla” ASCII text. Hypertext is electronic text which contains links to other texts, or reference points within the same text; and the following of these links is handled by the computer . “By creating computer linkages from the citations of an article to the corresponding source documents, users would be able to navigate through a body of related literature online simply by following the ‘electronic footnotes.’” [note] It is widely used for on-line reference texts, dictionaries and encyclopaedias on CD-ROM etc. Most significantly, a hypertext model is the basis for the World-Wide Web, originally implemented using the Hypertext Markup Language (HTML) and intended for distributing the literature of high-energy physics to researchers. Its application to other disciplines was encouraged, in particular by making the standards open, and few were slow to exploit its possibilities: in the Web we can presently see “a precursor of the networked environment that will permeate libraries in the future.” [note]
The storage and transmission of texts by electronic means is changing the way we use and approach them. They can be available from a single point of access anywhere in the world; the computer can take over the tedium of storing and retrieving the things we want to read or study. Where hypertext is exploited, the machine tirelessly performs the roles of librarian and page-turner. New developments may necessitate the removal of the data to other storage media, but this can be completely automated, minimising the opportunity for human error to play its traditional role in the copying process.
It seems unlikely that the proliferation of electronic texts will ever completely eradicate the urge to keep that hard copy “just in case”. It is hard to believe that there will ever not be a place for books in the world of the student or casual reader: there’s still a long way to go to make a device like Apple’s Newton as attractive - or cheap - as a paperback book. Yet now that so many words, not to mention sounds and pictures, are readily available in digitised form, the idea of such an encyclopædic electronic book as, say, The Hitch Hiker’s Guide to the Galaxy, is less strange and wonderful than it was twenty years ago.
Notes
- See Robert Matthews and Tom Merriam, “A bard by any other name”, New Scientist, 22 Jan 1994, pp. 23-27.
- Jeff Barry, “The HyperText Markup Language (HTML) and the World-Wide Web: Raising ASCII Text to a New Level of Usability.”, The Public-Access Computer Systems Review, no. 5 (1994), 5-64.
- Barry, op. cit.
