prev. | next


 

Structures of information

There is no such thing as brute or bare text. All the texts that we deal with - especially those

we generate ourselves - include a variety of ancillary information which is vital to their correct

interpretation.

Some are:

author
title
date and place of writing
language of the text
and so forth.

All such categories make explicit the structures in the texts. The exercise of making them explicit

may itself may structure our use of the text. If this sounds alarming consider the way in which

responses to a work are affected by knowledge of the name of the author, which is one of the

important ancillary or structuring pieces of information that I am considering. I suspect most

readers are comfortable with the idea of plucking volumes off shelves in bookshops or libraries

on the basis of no more information than the name of the author or title on the spine. What I am

suggesting is that comparable means of access can be provided for fieldnotes and other types of

field data. With this in mind, we should approach the way that we enter information into the

computer so as to facilitate or expedite the use of these textual structures. The most important

thing is to try and be as consistent as possible. This is very


  77



  important as a way of safeguarding oneself against the future.  You cannot know what future

uses you or a colleague may wish to make of any one text (or a collection of them).  If the texts

have a consistent structure then it is relatively straightforward for the computer to be made to

transform them so they can be used in a variety of different environments.

Minimally, what I mean by a consistent structure are things such as always including the

background information, and including it in the same sequence, in the same type face and so

on.  So it would be possible to instruct a computer to look at the third line of each text (which

may be a field note or an archival report) and since this is consistenly the date (in my example),

it can then extract all and only the texts within a certain range of years.  Suddenly from word

processor documents we begin to have some of the capabilities of a database.  In truth that is all

a database does - it imposes a (more or less) rigid structure on a set of data so as to facilitate

retrieval.  Anthropological material often is too chaotic to be easily coaxed into a true database

(exceptions to this generalisation are considered below), but it is perfectly suited to a semi-

structured approach. The challenge is to be able to keep the structure visible or accessible to the

computer and hence to the user by maintaining a high degree of consistency - it seems rather

trivial to a human user whether the return key has been pressed once or twice - to insert a

bigger or smaller space between sections, or whether or not a tab key rather than a space key

was used to insert a space.  However, since these are just the sorts of markers that computers

can recognise it is important that these 'small things' are properly managed. In the absence of a

true database in which the structure is automatically controlled, an anthropologists must

exercise caution and discipline, and attempt, wherever possible, to use various tricks to achieve

a consistent overall structure. This will enable them to use and exploit all the better the immense

amount of work and effort that has gone into writing the material in the first place!

While there may be very good reasons for not writing fieldnotes on computer (see above) I am

convinced that there are no good reasons for not indexing one's notes, and the only proper

place for an index is in a computer.

Useful tricks to achieve a consistent structure

a) Keep it simple




 

  78







  b) Exaggerate the format distinctions so they are clear on screen - they can always be changed

later for printing

c) wherever possible use templates: make a blank file with the basic structure you have chosen.

This can be saved1 so that it may be used as the basic blank form or template to be filled in.

Consider a parallel to the semi-structured nature of anthropological fieldnotes: since they have

consistent 'headers' or contextual information they can be seen as resembling verses of a poem

in which the opening lines of each stanza are very formulaic.  The point of this parallel is that

there are several quite sophisticated programmes available for analysing poetry (for example,

OCM, Tact etc.) which anthropologists could usefully employ. Typically we may be looking

for certain words or groups of words (collocations is the technical term) in the context of a

certain heading line e.g. within a certain date-range or at a particular place.

Concordances - dealing with texts

As well as fieldnotes, diaries, maps census returns  and archival documents, also have to deal

with photos, videos and films. All of these need to be indexed in much the same way as has

just been discussed for fieldnotes.  A common index will facilitate cross-references between

written text and visual record.

Since the work involved in putting images or sounds into computers is not negligible the best

use of time may be to enter just an index into computer so you can benefit from the access that

even a word processor file gives - many anthropologists make little or no use of their photos

and sound recordings having completed their fieldwork! This is criminal but no prosecutions

are forthcoming. Perhaps funding agencies should stop funding these parts of grants.  Access

is part of the reason for the problem - anthropologists know they have a photo somewhere in

the collection of thousand odd photos but no time to search for it.  Just as fieldnotes MUST be

indexed as the research progresses so too must photos and video and sound recordings.

Otherwise, in all too many cases, there is little point of making the record in the first place.  It

will be used for little more than occasional illustrations and presents for those in the




1Different systems use different metaphors for this - often you have an option to "save as", which then presents

further choices which include the format option "template" or "stationary". Once saved in this format the file

when opened creates a fresh copy in which all changes can be saved, preserving the blank original.


  79




  photographs - which may itself be sufficient but may not quite justify the effort spent in making

them!

Why this insistence on typing up indexes? Partly because it may help others use your material

at some indefinite time in the future, but more importantly it is a great help to you: rather than

having to keep in your head some vague notion that on a particular tape there was a

conversation that is somehow relevant to the problem at hand, you can look it up. This is why

the literacy argument is important and finds a place in a practical manual. By externalising

thoughts and mental constructs - such as where we have stored a particular bit of information

we are freed (so runs the pious hope) to think of better things. It is easier to analyse well

indexed research material and hence we may expect more and better analysis from the indexers.

In addition, the work of indexing forces a familiarity with the whole, and such a pass over the

entire work may not otherwise occur. Indeed, this is why I try and index my notes in the field

since the discipline of so doing reveals holes in what I have learnt, inconsistencies and loose

ends which can be sorted out easily while I am there in the field. There is a downside to this -

by refusing to index, an anthropologist may rely more on their memory. This may encourage

analytic insight; riffling through fieldnotes/photos/tapes may facilitate serendipitous

discoveries. Yet when I look something up I always find myself drawn to adjacent pages so I

am not convinced that not indexing really increases serendipity. Nor am I convinced by the

'mental index' view which I believe to be self-serving and encourages the creation of structures

independent of available evidence.

Case study: Newspaper cuttings from Pakistan

In the course of doctoral research in Pakistan a doctoral student collected a large collection of

newspaper cuttings on the subjects that concerned her - Islam and politics. These were topical

during the period of her research which ended up focusing on the public representations of

these topics, so the press coverage and what her informants made of it was very important to

her.

To cope with collection she first of all scanned the cuttings then used an Optical Character

Recognition (hence OCR) program for data entry. This has the same end-result as having the

cuttings copy-typed - a set of word processor files. OCR is good for bad typists who have



  80







  access to clean, clear paper copies.  However, it should be noted that it is often quicker and

more efficient to have things typed by humans.  In this case there were problems initially with

disc space since the information is first stored as a graphic image (which occupies lots of disc

space).  The OCR program then processes this to detect the graphic symbols for the typed

characters from which it generates a word processor file as if the text had been typed in.  Once

this is done the graphic files may be deleted.  But a further problem occurred: to begin with the

OCR program failed to recognise the columns in the newspaper articles.  After these initial

problems had been resolved (and they were) the method worked well and the student ended up

with a large usable collection of texts.  She then had to decide whether to spend time coding

them.  As a means to help her decision I recommended that she first processed the collection

with a concordance generator so she could start to browse through the texts and see what

turned up as she did this and as she worked through her fieldnotes in the course of writing up.

It is only worth spending the time and effort coding if there is a clear research question which

requires it. The point of beginning with a concordance is that it is swift and easy to produce and

it doesn't commit one to anything - there are computer programs available to help the analysis

of qualitative data but typically they assume that the user starts with a set of categories or

concepts of interest.  This is typically never the case with anthropological research - the

qualitative analysis programs confuse the goal and the starting point of research and hence are

not found helpful by many anthropologists. Browsing through a concordance is a helpful

exercise that can assist the recognition of important themes that one wishes to pursue. The

exercise of indexing ones own work can have the same effect.

The problem for databases

Databases are useful tools for the systematic entry of predictable data (of which more below).

They offer a greatly enhanced ease of recovery. Different categories of information will not get

lost if, and it is a big if, they have been correctly entered in the first place. Databases are

typically used to deal with very different sorts of information from fieldnotes1.





1Although there are some overlaps- it is possible to enter fieldnotes into some types of database - particularly

those without constraints on field size.


  81








  A good example would be a database of plant names where for every specimen you may wish

to record a local name, the scientific name, a sample number, notes about habitat and

information about local uses in food, ritual and medicine.  In short, there is a predictable set of

categories of information, and by implementing these in the form of a database you can be

assisted to be consistent in the manner of recording the information. Once this has been

achieved the same structure can be of great help in finding information - it is conceptually no

different from using a set of file cards on which the same sort of information is always

recorded in the same place on the card - in a computer database set-up for the plant names

information about the scientific name belongs in the 'field' (i.e. category) for scientific names:

Illustration of file card
  sample number 1
  local name kembar
  scientific name ceiba pentada
  habitat forest
  notes kapok collected at end of dry
  season and spun, was used in
mattresses

  Another important type of systematic information that all academics have to deal with are

bibliographies, and this is where databases really come into their own.  For, in the course of a

couple of years of research activity, you are likely to have looked at many books and even more

scattered journal articles in a variety of different libraries.  To return to a particular article you

need to know not only which library and where it is but the page numbers as well! This is a

classic task for a database. And there is more: if you wish to use a reference in a bibliography

then there is the additional chore of formatting. Particular universities and journals have

different formatting styles laid down and it is extremely tedious having to change the titles

which were in italics to being plain text but within inverted commas and so on.  There are now

a number of special bibliography programs available that not only at as databases to allow the

easy manipulation of the references that you collect in the course of your research but also

allow you to print out the references in a variety of different formats - so the same collection of

references may be printed first in the Chicago style for a university thesis, then reformatted in

JRAI or AA format for submission to a journal. Once a style template has been set-up (and the

main programs come with a large set of pre-established style templates) then the output format

can be changed swiftly with little or no further editing required. I write with the enthusiasm of a


  82








  convert: having started to use one of these programs more than five years ago I can scarcely

bear to contemplate the hours wasted in the past wrestling with bibliography formats. And I

bow my head in shame when I hear of new research students being told by their supervisors to

keep bibliographic references on file cards. If ever there was a simple task which computers

can greatly facilitate then bibliographies are it!

Everyone is kin

Genealogies are much much harder, and the extraordinary thing is that there still is not a good

program suitable for an anthropologists to use available in 1997.  CSAC is working on this1 -

the commercial market is dominated by those doing family histories in Europe and America and

the programs reflect their interests and cultural biases, thus making them unusable for a wide

variety of non-western family structures, quite apart from the different graphic representations

used).

It is worth reflecting upon why the task of managing bibliographies is straightforward but

genealogies is not. A bibliographic record is an autonomous, independent entity, containing

information such as Author, date, title, pages, location journal title etc. etc.  So a bibliographic

database consists of a list of entries and is thus conceptually very like a stack of file cards.  But

a genealogy is very different because are relationships, and these make the problem much more

complex.

In essence, a genealogy comprises a list of individuals (who have attributes such as name, sex,

date of birth, place of birth, residence, languages spoken etc.) and other lists of marriages or of

children. But among the elements of these lists are references to individual people. So one list

refers to another - we are dealing with a relational database.  The references between these lists

are the essence of the relationships of interest to anthropologists.  One irony is that the

genealogical tree is an extremely efficient way of representing this information - for all the

problems of drawing the trees and the ideology behind them (Bouquet 1993).

There are two separate aspects to a genealogical database which help demonstrate both the

utility and the problems concerned: data entry and access (browsing or examining the data that

have already been entered).

1 See the work included in the Experience Rich Anthropology project- which has an online drawing programme.



  83



Data Entry

  When typing in the data it is important to try and minimise any errors, and to reduce to a

minimum any repetition. It should not be necessary to type repeatedly the names of mother and

father for the ten children resulting from a particular marriage. Attempts to replicate in computer

the sheer efficiency of genealogical trees have not been particularly successful. The

genealogical tree is an incredibly efficient and simple device for recording relationships.  They

are good for a few tens of people, but can quickly become cluttered and unmanageable once

more than a hundred or so individuals are involved.  And once multiple marriages occur which

may involve unions between kin and across generations then the diagrams become very hard to

maintain on paper.  Automatically generated diagrams can be selected to represent particular

views, perhaps omitting categories of people to make the diagram clearer. But that is to move to

a discussion of browsing rather than data entry.  I take it as basic that genealogical information

includes data not easily or neatly represented on tree diagrams - which are good for

relationships but not gossip or more mundane information such as dates.  Data entry then must

allow basic information (where known) about an individual to be easily typed in.  These

typicallyinclude categories such as
  Unique Id
Sex
Name (s)
Date of Birth
Place of Birth
Date of Death
Place of Death
Current Residence
Time at current residence
Education
Religion
Economic Data (may be several different fields)
Genetic/Medical Data (may be several different fields)
Gossip/Other Information


  The list of possible sorts of information that may be recorded is strictly that concerning a

particular individual. How can we enter data about parents without having to repeat the same

information for siblings? The neatest solution seems to be to have a second set of data entries

that record relationships - and which use the unique individual id numbers to keep track of the

relationships.




  84








  There are two solutions which are more or less equivalent, and which serve to demonstrate the

type of approach which can be developed. In the first solution we can add to the data on

individuals just one field which records the Marriage (or union) Id of that individuals parents.

If nothing is known of either parent then it may be left blank.


A new data type is then created which contains the following types of field
  Marriage Id Note this also appears in the records of individuals and thus connects
  individuals to their parents
  Husband Id Note this and the following Id numbers connect individuals with their
  spouses (legitimate and illegitimate)
  Wife Id
Date of start
Date of end
Other information e.g. place of marriage, dates of prenuptial events, divorce etc.


  This solution does generate a little repetitive typing but not much - the marriage ID must be

added to each sibling. Note that if someone remarries or is a part of a polygamous marriage a

new marriage record must be made for each union. So a man with ten wives and an illegitimate

child would have ten marriage records associated with the official wives, and another one for

the extra-marital union.  The database structure is neutral with respect to legitimacy. Extra fields

may have to be added if issues such as legitimacy are important.

In the second solution the individual record contains no information about parents, no reference

to a marriage record. Instead the marriage records contain references to the individuals that the

union produces. A typical format would be:
  Marriage Id
Husband
Wife
Date of start
Date of end
Offspring set of Individual ID nos...
Gossip/Other Information


  This type of cross-referencing may sound onerous but a well designed system can mask the

chore of having to copy type random id numbers - the systems can display a list of the

individuals, and the marriages that have already been entered. The user can select a particular

individual then click on a special button or switch on screen which will add the id of that



  85


prev. | next


   Contents



Go to ERA | Go to CSAC Monographs