Chapter Two: Dissecting the book |
The History of the index - making information accessible |
What computers can do well or How the compact disk edition of the Oxford English Dictionary is better than the printed one |
The compact disc edition of the Oxford English Dictionary clearly illustrates the
sort of onerous tasks which computers can perform quickly and thoroughly. While in theory nothing prevents a paper search for any item of information by the perusal of twenty thick volumes of dictionary, in most cases the task will simply not merit the time involved. With a computer, searches which are, in practical terms, impossible with the paper edition can be achieved in a few seconds. For example, the reader can search easily for text in the definitions (as well as in the headwords), or discover how many times James Joyce is quoted, or search for all specified items dated between 1500 and 1648. To give a concrete example: there is a sense of completeness which a reader gains from holding a book in the hand. Jane Austen has discussed this sense of completeness in the context of the tragic inevitability of the end of a novel as the reader approaches the final page1. How to discover the word that best fits this sense of the completeness of a book? Using the compact disc version of the second edition of the Oxford English Dictionary, the reader can discover (without leaving their desk) that it contains within its definitions 73 incidences of the word 'completeness'. Among these, the closest synonym to the concept at issue is 'perfectiveness'. It would have been an extremely tedious procedure to discover this using the paper edition. Effectively the dictionary can be used as a thesaurus. The production of concordances, indices of various sorts, the comparison of different versions of a text: all these are greatly facilitated by the use of computers. Do these enhancements which are, after all, nothing really new, add up to producing a qualitative change? Elizabeth Eisenstein has argued along just these lines: namely that the introduction of printing technology was a contributory factor in the development of empirical science and the styles of cognition that accompany it. So, as Francis Bacon claimed, printing changed the world. The same revolutionary claims have been made for computers, and 1I am grateful to Jeremy Avis for reminding me of this. |
19 |
particularly about the complex webs of linkages between passages and files that they
enable. Does the increasing use of computers constitute or enable a new way of thinking? The writer thinks not, although it may be that it is too early to tell. Certainly, the use of the machines encourages us to ask new questions. For example, how do particular words occur and co-occur in the entire Greek corpus? How are pronouns used in different parts of Shakespeare's plays? Such exercises have been undertaken and have led to some re-attributions of authorship. Notwithstanding, I am not convinced that the style of thought is different. So much of academic life revolves around reflection upon, and connections between, texts. Certainly, if the connections are more easily accessible, then the process of reflection is facilitated. Nonetheless, the machines remain the tools of the researcher whose input is critical; computers neither think for us nor provide us with a new way of thinking. Connections (as exemplified by so-called hypertext links discussed below) may be suggestive and fruitful, but they are not necessarily so. The links are created by the programmer; they cannot be predicted or constructed automatically by the computer. (Although some researchers claim that automatic link generation is possible1). Hypertext's advantages arise only after the research (for which it is a tool) has been undertaken. The informative links are those which we struggle to create2, so the only true hypertext may be that of the imagination. Sadly, the imagination is not transferable between people. Such problems with the construction of hypertexts illustrate another way in which the advocates of hypertext make misleading claims. One of Landow's examples is of the difference in experience between a young student reading a poem by Milton and a Professor of English Literature reading the same text. Only the Professor fully recognizes the quotations, and appreciates the allusions and the repeated imagery. Admittedly, some of this can be helpfully conveyed by a hypertext system (just as by extensive footnotes in a book). However, no such system can replicate the experience of the individual reader. Part of the reason that the 1Kaplan, (1990). Waterworth (1992) reports the results of an examination of machine-generated hypertext links. He concludes that they are worse, or no better, than randomly-generated links, in terms of establishing connections which are recognized by readers. (Interestingly, human-made links fared little better). 2This applies to all sorts of research, in hard science and humanities alike. It may also indicate why hypertext is a medium more suitable for teaching than for research. |
20 |
Professor's experience of the poem is deeper is quite simply phenomenal. The Professor
will probably be older than the student and have read the poem many times before, and therefore will not only bring to it a greater stock of personal experience, but will also be more familiar with the poem (and with related texts). Thorough knowledge of the literature is achieved, for example, during the writing of a literature Ph.D. It is hubris to suggest that a hypertext system could possibly replace (or replicate the experience of conducting) such research. Claims that hypertext can do this are so exaggerated that they deserve a polemic response. The links which a writer puts into a hypertext system may, in a limited manner, show the connections which the writer perceives; however, a reader will not necessarily be convinced of the significance of these connections. You can lead Buridan's Ass to water but you can't teach it to appreciate formal logic. I return to hypertext in greater detail later on. A review of the history of indexing follows, by way of an introduction to some of the techniques which have now been made widely available by computers. |
The keyword, the citation index and the concordance The history of the index. Access and order: the index and the library catalogue Early sorting strategies |
The earliest forms of known writing are the logographic and cuneiform writing systems.
These were records made by bureaucrats on clay tablets. |
'When [c. 2500 B.C.] they attempted to make an inventory of Sumerian words, the native Mesopotamian scribes faced a problem familiar to any lexicographer in the first stages of planning a dictionary: should the entries be organised thematically, by subjects, or should they be arranged in a serial order based on graphic or phonological characteristics of the words?' Goody1. |
If a graphic or phonological basis is chosen, then it is necessary to decide which
phonological characteristics should determine the alphabetical order. In our own alphabet, the determinant for sorting is upon the beginning of the first syllable of a word. This principle (called acrophony) lies behind all the alphabet primers with which primary school children were taught to read. The fact that our system relies upon what was initially an arbitrary choice is usually overlooked. The possibility of using different sorting systems may have some relevance in the current controversy about how writing and reading is best taught in primary schools. The issue of alphabetization and the principles of sorting are addressed after making one important distinction. 1(1977: 97/98), quoting Landsberger. |
21 |
New searching techniques- concordances, keywords and abstracts - access to the sources. The history of alphabet and index; the contents page etcetera |
It is useful for the contemporary researcher to bear in mind the distinction between
order and access. The criteria used by those offering books (e.g. booksellers) differ from the methods used by potential readers seeking a book on a particular topic. This distinction is significant despite the fact that there is overlap between the two cases. (I shall be saying a lot more about the ways in which readers search by browsing in the next few chapters). The simplest way of thinking of the difference is to contrast a collection of books sorted in alphabetic sequence of their authors names with the same collection sorted by their subject matter. The first gives 'order', the second 'access'. |
Order |
Order enables the researcher to discover which books are on which shelves. (This
sense of the researcher as seeker is reflected in the French word for researcher, chercheur). Order was made practicable by radical innovations such as the title page and the contents page. Until only eight or nine hundred years ago it was impossible to discover either the authorship or the subject matter of a scroll, tablet or codex without reading it1. A codex is the earliest form of the bound volume, resembling the book form which we now recognize. Codices contain a collection of manuscripts bound together, often including works by several authors. Since several works within a codex volume were often bound together without a contents page, its contents could not be easily ascertained. Small wonder that works in major libraries were overlooked, some of which are still being rediscovered. Order is achieved within a text by the inclusion of a title, a contents page and an index. Beyond the individual text, books are ordered in the catalogues of booksellers and libraries. (In the case of libraries, the catalogue only achieves order when it records authors and titles, rather than a simple list of acquisitions without any information about their contents, which was sometimes the only catalogue available). The comprehensively ordered library catalogue as we now know it is a relatively recent invention. The eighth century catalogue of the library of the Abbey of York consisted of a poem 1The plot of Umberto Eco's 'The Name of the Rose' hinges on this. |
22 |
which purported to list the book titles. However, titles which did not fit the metre
were omitted1. |
Access |
The combination of a contents table and an alphabetical index eventually provided
the means for the reader efficiently to gain access to any specified passage of a book. They enable quick reference, obviating the need to read an entire book in order to find (or not) a particular section. They are also a boon for browsers who are not entirely sure of what they are seeking, but who can scan a contents table and index to see whether there is any material in a text which is relevant to their interest. Page numbers and a fixed sequence of the alphabet are prerequisites for the use of both contents pages and indices. Numbers for pages or folios (a folio is a leaf of paper, and there are two pages to one folio) only came into usage from the thirteenth century A.D., in the last years of the codex. They only became commonplace after printing enabled the production of uniform editions. Tables of contents were known and used from the late antique period, but were not commonplace until about the same time. In the absence of page numbers, a table of contents can only give a sequence of topics. As an alternative to the indication of structure by page numbers, order can be achieved by creating other manageably sized divisions within longer passages, provided that the reader is able to locate the longer passage readily. An example is the standardized structure of Biblical chapters and verses. Given the reference: Job 28:122 or Ecclesiastes 12:123, anyone with a basic knowledge of the order of the Books within the Bible knows more or less where to look, so that page numbers are unnecessary. (With an index showing the order of the Books, even that basic knowledge is not needed). However, the standard layout of the Bible was not established until relatively late. Rouse and Rouse place it in the mid thirteenth century, little more than two centuries before the advent of printing.4 1A recent parallel may be found in the St Andrews Psychology library wher books were kept in order of purchase to encourage serendipity. 2'But where shall wisdom be found? and where is the place of understanding?' 3'Of making many books there is no end; and much study is a weariness of the flesh' Note the anachronism of the King James translation. In the time of the writer of Ecclesiastes books, as we understand them, did not exist. 4Rouse and Rouse (1982: 221). |
23 |
Turning to the alphabet. It may seem remarkable to us that the use of the fixed order
of the alphabet to sort items was introduced far later than the use of the alphabet itself. From the beginnings of alphabetical writing the symbols have had a fixed order (a,b,c...). This was learned by scribes as part of learning to read and write, but was not (at first) seen as anything more than an arbitrary order of characters. It was not used to provide a sequence for ordering words, let alone the objects or concepts named. Not much has been written about the process whereby people learn to write. It seems that in the western tradition, at least, people originally learnt to write by copying words, and phrases. This resembles Koranic school in which one learns to write Arabic by copying the verses of the Holy Koran. The implications of this are interesting. The starting point, in a literal sense, is by no means clear. A Christian might begin with the first lines of Genesis, or the Psalms, or the New Testament: Table: Some possible first phrases for a Christian: the beginning of Genesis, the Psalms, and the New Testament |
In the beginning... Blessed is the man... The book of the generation... |
I hope I have stressed sufficiently that a fully developed system of alphabetical
writing need not imply that items share the fixity of alphabetical order. The fixity of the ABC need not be generalized beyond the alphabet. In a much quoted passage, a thirteenth century Genoese writer claims to have invented 'the' alphabetical ordering of words: |
'Amo' comes before 'biblio' because 'a' is the first letter of the former and 'b'
is the first letter of the latter and 'a' comes before 'b' ... by the grace of God working in me, I have devised this order |
This claim has been doubted,1 but the quotation
clearly suggests that the sorting of words by the alphabetical sequence of their first letters was not widespread at that date. What are the implications of the absence of a fixed alphabetical sequence? How could one find one's way around a book or a library? Among the possibilities are to seek a text by hunting for the physical characteristics of the volume in question: its size, its age, or perhaps its colour. 1Eisenstein (1979: 89). |
24 |