prev. | next




  Chapter Two: Dissecting the book

  The History of the index - making information accessible
  What computers can do well or
How the compact disk edition of the Oxford English Dictionary is better
than the printed one

  The compact disc edition of the Oxford English Dictionary clearly illustrates the sort of onerous

tasks which computers can perform quickly and thoroughly. While in theory nothing prevents a

paper search for any item of information by the perusal of twenty thick volumes of dictionary,

in most cases the task will simply not merit the time involved. With a computer, searches which

are, in practical terms, impossible with the paper edition can be achieved in a few seconds. For

example, the reader can search easily for text in the definitions (as well as in the headwords), or

discover how many times James Joyce is quoted, or search for all specified items dated

between 1500 and 1648.

To give a concrete example: there is a sense of completeness which a reader gains from holding

a book in the hand. Jane Austen has discussed this sense of completeness in the context of the

tragic inevitability of the end of a novel as the reader approaches the final page1.  How to

discover the word that best fits this sense of the completeness of a book? Using the compact

disc version of the second edition of the Oxford English Dictionary, the reader can discover

(without leaving their desk) that it contains within its definitions 73 incidences of the word

'completeness'. Among these, the closest synonym to the concept at issue is 'perfectiveness'.

It would have been an extremely tedious procedure to discover this using the paper edition.

Effectively the dictionary can be used as a thesaurus.

The production of concordances, indices of various sorts, the comparison of different versions

of a text: all these are greatly facilitated by the use of computers.

Do these enhancements which are, after all, nothing really new, add up to producing a

qualitative change? Elizabeth Eisenstein has argued along just these lines: namely that the

introduction of printing technology was a contributory factor in the development of empirical

science and the styles of cognition that accompany it. So, as Francis Bacon claimed, printing

changed the world. The same revolutionary claims have been made for computers, and


1I am grateful to Jeremy Avis for reminding me of this.


  19








  particularly about the complex webs of linkages between passages and files that they enable.

Does the increasing use of computers constitute or enable a new way of thinking? The writer

thinks not, although it may be that it is too early to tell. Certainly, the use of the machines

encourages us to ask new questions. For example, how do particular words occur and co-occur

in the entire Greek corpus? How are pronouns used in different parts of Shakespeare's plays?

Such exercises have been undertaken and have led to some re-attributions of authorship.

Notwithstanding, I am not convinced that the style of thought is different. So much of

academic life revolves around reflection upon, and connections between, texts. Certainly, if the

connections are more easily accessible, then the process of reflection is facilitated. Nonetheless,

the machines remain the tools of the researcher whose input is critical; computers neither think

for us nor provide us with a new way of thinking. Connections (as exemplified by so-called

hypertext links discussed below) may be suggestive and fruitful, but they are not necessarily

so. The links are created by the programmer; they cannot be predicted or constructed

automatically by the computer. (Although some researchers claim that automatic link generation

is possible1). Hypertext's advantages arise only after the research (for which it is a tool) has

been undertaken. The informative links are those which we struggle to create2, so the only true

hypertext may be that of the imagination. Sadly, the imagination is not transferable between

people.

Such problems with the construction of hypertexts illustrate another way in which the

advocates of hypertext make misleading claims. One of Landow's examples is of the difference

in experience between a young student reading a poem by Milton and a Professor of English

Literature reading the same text. Only the Professor fully recognizes the quotations, and

appreciates the allusions and the repeated imagery. Admittedly, some of this can be helpfully

conveyed by a hypertext system (just as by extensive footnotes in a book). However, no such

system can replicate the experience of the individual reader. Part of the reason that the


1Kaplan, (1990). Waterworth (1992) reports the results of an examination of machine-generated hypertext links.

He concludes that they are worse, or no better, than randomly-generated links, in terms of establishing

connections which are recognized by readers.  (Interestingly, human-made links fared little better).

2This applies to all sorts of research, in hard science and humanities alike. It may also indicate why hypertext is

a medium more suitable for teaching than for research.


  20








  Professor's experience of the poem is deeper is quite simply phenomenal. The Professor will

probably be older than the student and have read the poem many times before, and therefore

will not only bring to it a greater stock of personal experience, but will also be more familiar

with the poem (and with related texts). Thorough knowledge of the literature is achieved, for

example, during the writing of a literature Ph.D. It is hubris to suggest that a hypertext system

could possibly replace (or replicate the experience of conducting) such research. Claims that

hypertext can do this are so exaggerated that they deserve a polemic response. The links which

a writer puts into a hypertext system may, in a limited manner, show the connections which the

writer perceives; however, a reader will not necessarily be convinced of the significance of

these connections. You can lead Buridan's Ass to water but you can't teach it to appreciate

formal logic. I return to hypertext in greater detail later on.

A review of the history of indexing follows, by way of an introduction to some of the

techniques which have now been made widely available by computers.
  The keyword, the citation index and the concordance

The history of the index. Access and order: the index and the library catalogue


Early sorting strategies

  The earliest forms of known writing are the logographic and cuneiform writing systems. These

were records made by bureaucrats on clay tablets.
  'When [c. 2500 B.C.] they attempted to make an inventory of Sumerian words, the
native Mesopotamian scribes faced a problem familiar to any lexicographer in the first
stages of planning a dictionary: should the entries be organised thematically, by
subjects, or should they be arranged in a serial order based on graphic or phonological
characteristics of the words?' Goody1.

  If a graphic or phonological basis is chosen, then it is necessary to decide which phonological

characteristics should determine the alphabetical order. In our own alphabet, the determinant for

sorting is upon the beginning of the first syllable of a word. This principle (called acrophony)

lies behind all the alphabet primers with which primary school children were taught to read.

The fact that our system relies upon what was initially an arbitrary choice is usually

overlooked. The possibility of using different sorting systems may have some relevance in the

current controversy about how writing and reading is best taught in primary schools. The issue

of alphabetization and the principles of sorting are addressed after making one important

distinction.


1(1977: 97/98), quoting Landsberger.


  21








  New searching techniques- concordances, keywords and abstracts - access
to the sources.
The history of alphabet and index; the contents page etcetera

  It is useful for the contemporary researcher to bear in mind the distinction between order and

access. The criteria used by those offering books (e.g. booksellers) differ from the methods

used by potential readers seeking a book on a particular topic. This distinction is significant

despite the fact that there is overlap between the two cases. (I shall be saying a lot more about

the ways in which readers search by browsing in the next few chapters). The simplest way of

thinking of the difference is to contrast a collection of books sorted in alphabetic sequence of

their authors names with the same collection sorted by their subject matter. The first gives

'order', the second 'access'.


  Order

  Order enables the researcher to discover which books are on which shelves. (This sense of the

researcher as seeker is reflected in the French word for researcher, chercheur). Order was made

practicable by radical innovations such as the title page and the contents page.

Until only eight or nine hundred years ago it was impossible to discover either the authorship

or the subject matter of a scroll, tablet or codex without reading it1. A codex is the earliest form

of the bound volume, resembling the book form which we now recognize. Codices contain a

collection of manuscripts bound together, often including works by several authors. Since

several works within a codex volume were often bound together without a contents page, its

contents could not be easily ascertained. Small wonder that works in major libraries were

overlooked, some of which are still being rediscovered.

Order is achieved within a text by the inclusion of a title, a contents page and an index. Beyond

the individual text, books are ordered in the catalogues of booksellers and libraries. (In the case

of libraries, the catalogue only achieves order when it records authors and titles, rather than a

simple list of acquisitions without any information about their contents, which was sometimes

the only catalogue available).

The comprehensively ordered library catalogue as we now know it is a relatively recent

invention. The eighth century catalogue of the library of the Abbey of York consisted of a poem



1The plot of Umberto Eco's 'The Name of the Rose' hinges on this.


  22








  which purported to list the book titles. However, titles which did not fit the metre were

omitted1.


  Access

  The combination of a contents table and an alphabetical index eventually provided the means for

the reader efficiently to gain access to any specified passage of a book. They enable quick

reference, obviating the need to read an entire book in order to find (or not) a particular section.

They are also a boon for browsers who are not entirely sure of what they are seeking, but who

can scan a contents table and index to see whether there is any material in a text which is

relevant to their interest.

Page numbers and a fixed sequence of the alphabet are prerequisites for the use of both

contents pages and indices. Numbers for pages or folios (a folio is a leaf of paper, and there are

two pages to one folio) only came into usage from the thirteenth century A.D., in the last years

of the codex. They only became commonplace after printing enabled the production of uniform

editions. Tables of contents were known and used from the late antique period, but were not

commonplace until about the same time. In the absence of page numbers, a table of contents

can only give a sequence of topics.

As an alternative to the indication of structure by page numbers, order can be achieved by

creating other manageably sized divisions within longer passages, provided that the reader is

able to locate the longer passage readily. An example is the standardized structure of Biblical

chapters and verses. Given the reference: Job 28:122 or Ecclesiastes 12:123, anyone with a

basic knowledge of the order of the Books within the Bible knows more or less where to look,

so that page numbers are unnecessary. (With an index showing the order of the Books, even

that basic knowledge is not needed). However, the standard layout of the Bible was not

established until relatively late. Rouse and Rouse place it in the mid thirteenth century, little

more than two centuries before the advent of printing.4



1A recent parallel may be found in the St Andrews Psychology library wher books were kept in order of purchase
to encourage serendipity.

2'But where shall wisdom be found? and where is the place of understanding?'

3'Of making many books there is no end; and much study is a weariness of the flesh' Note the anachronism of

the King James translation. In the time of the writer of Ecclesiastes books, as we understand them, did not exist.

4Rouse and Rouse (1982: 221).


  23




  Turning to the alphabet. It may seem remarkable to us that the use of the fixed order of the

alphabet to sort items was introduced far later than the use of the alphabet itself. From the

beginnings of alphabetical writing the symbols have had a fixed order (a,b,c...). This was

learned by scribes as part of learning to read and write, but was not (at first) seen as anything

more than an arbitrary order of characters. It was not used to provide a sequence for ordering

words, let alone the objects or concepts named.

Not much has been written about the process whereby people learn to write. It seems that in the

western tradition, at least, people originally learnt to write by copying words, and phrases.

This resembles Koranic school in which one learns to write Arabic by copying the verses of the

Holy Koran. The implications of this are interesting. The starting point, in a literal sense, is by

no means clear. A Christian might begin with the first lines of Genesis, or the Psalms, or the

New Testament:


Table: Some possible first phrases for a Christian: the beginning of Genesis, the Psalms, and

the New Testament

  In the beginning...

Blessed is the man...

The book of the generation...

  I hope I have stressed sufficiently that a fully developed system of alphabetical writing need not

imply that items share the fixity of alphabetical order. The fixity of the ABC need not be

generalized beyond the alphabet. In a much quoted passage, a thirteenth century Genoese writer

claims to have invented 'the' alphabetical ordering of words:
  'Amo' comes before 'biblio' because 'a' is the first letter of the former and 'b' is the
first letter of the latter and 'a' comes before 'b' ... by the grace of God working in me, I
have devised this order

  This claim has been doubted,1 but the quotation clearly suggests that the sorting of words by

the alphabetical sequence of their first letters was not widespread at that date.

What are the implications of the absence of a fixed alphabetical sequence? How could one find

one's way around a book or a library? Among the possibilities are to seek a text by hunting for

the physical characteristics of the volume in question: its size, its age, or perhaps its colour.


1Eisenstein (1979: 89).


  24


prev. | next


   Contents


Go to ERA | Go to CSAC Monographs