Relics and Selves

How to create a virtual museum: the story of 'Relics and Selves' Patience A. Schell (University of Manchester))
with John Bradley and Paul Spence (Centre for Computing in the Humanities, King's College London)
This virtual exhibition originated with the idea of deconstructing the rarefied and sanctified museum atmosphere, and thus subvert the order and cataloguing of objects which were important to the consolidation of national imaginaires in 1880s Argentina, Brazil and Chile. The Relics and Selves project, then, seeks to take these items out of their cases and the order imposed on them, so that visitors themselves can un-order and re-order them. Using database and internet technology, we can bring together thousands of images it would be impossible to handle via traditional publication methods.

In designing a project like this the first two steps are to determine if being on-line is the best publication method. For us, since we wanted a collection of thousands of images, on-line and/or CD publication were the only viable options. Second, we had to find local technical support. We've been lucky to be working with King's College London's Centre for Computing in the Humanities, under the direction of Harold Short. King's has worked on similar projects, which combine text and images into an integrated site and search engine. Following first discussions with King's we began the long process of searching and browsing the internet, looking at other sites and how they manage their material so that we had a basis for comparison.

The influence of the technology is evident even in how the research for the project was undertaken. As we worked out research trips to Chile, Brazil and Argentina, planning on photographing the materials which would become the visual data base of this site, we had to decide how we were going to collect these images. Basically, our materials are either taken from slide film or from a digital camera. A digital camera was purchased so as to skip the step which slide film involves, the scanning to make a digital image. Yet, as so often happens with this type of project, it turned out that the slide images were easier to process simply because it proved difficult to obtain the peripheral software and equipment for 'capturing' the images from the digital tapes. So mixing media is the best policy, as it means you are not dependent on any one technology. We also collected paper images which had to be flatbed-scanned into the computer. Having a clear understanding of the final product, then, will help in the design of the entire project, including the research preparation.

Slides were originally scanned at a 1350 pixels-per-inch resolution (a fairly high one) and saved as TIFF files. These are our archival-quality digital images, which we have safeguarded and count as our digital originals. The images from the digital tapes were also saved as TIFF files, although their resolution is lower on the original. Once the images were in TIFF form we converted them into sizes of JPEGs: one of thumbnail size for display on the web and for use in the data entry process; the larger JPEG is for display on the web only.

In order to catalogue and search this material, we needed a database to hold and manage it. Our image database was originally designed by Hafed Walda of King's College London's Centre for Computing in the Humanities in Filemaker Pro. It took about six months for Jens and myself to agree on the categories which we would use for the database. At the time, I naively assumed I could draw up a list, discuss it with Jens and we'd have the categories in a matter of a week. What we hadn't realized was that in designing the categories, really a sophisticated index with storage capacity, we had to re-conceptualise the material. Partially, this was due to the fact that a computer -even the smartest one- could not understand the subtleties that we found in much of our work, so we had to come up with rules, categories and lists consistent enough for a computer to manage them.

In designing the categories, we had several kinds of problem to deal with. First of all we had to consider what information needed to be attached to the image: technical data about the digital image, the name of the digital image and its resolution. We needed to know where the image came from, who had taken the original photo and who the researcher was, what was the copyright situation of the image and who had entered the data. Then we needed information about the depicted object itself: who was the author/painter, what was the title, what was the date on the object. Finally, we needed to put the image into more abstract sets of discursive and conceptual terms. In creating this variety of categories, we were thinking of both our own needs and the needs of potential users as we imagined them. In summary, the categories combine both theoretical, analytical and practical information about the image and the object depicted.

The other side of our project was the mark-up of texts. In designing the text collection we had to deal with similar types of questions as with the image database, altered slightly in that the material is fundamentally different. In addition to historical source materials, the text collection holds pieces of critical analysis of topics arising from the image collection, which are indexed by persons, institutions, places, as well as cross-referenced in terms of discourses, media and forms of display discussed, and of the key terms under which the essays discuss the subject of the exhibition.

We've been using a new computing programme for content-oriented as well as structural markup of text files, which allows the user to mark up documents in XML (Extensible Markup Language) format. 'XML is a markup language for documents containing structured information'. (Norman Walsh). It is a means for standardising the way we exchange information on the web and it has various advantages over HTML in a project like this:

As its name suggests, XML is 'extensible'. XML is not a fixed language, like HTML, but a meta-language that we can use to 'create' our own languages. In other words, it allows us to define our own tags -as many as we want- and give them names that we find meaningful, (e.g. <book>...</book> or <film>...</film>). This means that 'taggers' (the people carrying out the mark-up) on a project like this only need to understand the mark-up system being used, rather than learning HTML. Perhaps more importantly, it allows for far richer mark-up of text than is possible with HTML, and allows for more things to be done with the texts once they have been created and tagged.

Definition of structure
Whereas HTML largely defines style (how a document is rendered), XML only defines the structure of a document. This kind of document analysis is useful for a number of reasons:

Separation of content from design
Firstly, because it separates content from design. You mark up a document in XML and then apply a design (using something called a 'stylesheet') in a separate process. This means that if you have 50 text documents marked up in XML, you can use the same stylesheet 50 times. So if you decide to change the design at a later date, all you need to do is change one document, i.e. the stylesheet, instead of re-tagging the 50 text documents. An important aspect of the separation of content from design lies in the fact that, by tagging by what the material is, rather than how you want it to look, you make explicit to the computer important materials in the text that allow it to manipulate this material for you. By explicitly tagging persons you allow the computer to generate an index of persons because it knows where the references to people are.

Multiple versions from same source document
With XML, you can easily apply more than one design (i.e. attach more than one stylesheet) to each document. In other words, if you want one version of every document to appear in plain text, but you want another version where every book and author is highlighted in a different colour, then you can do that without having to re-tag 50 documents individually.

Paul Spence has designed the defining list (or rules sheet) of how we categorize our material, which includes both elements of structure and labelling of content. Paul has done this after various conversations with Jens and me, which have forced us, yet once again, to rethink how to categorise material. Designing the tags has been an exercise in re-conceptualising the material, as we had to think of categories that are narrow enough to be useful and yet broad enough to allow a variety of material. In the end we had to think of not only what we needed from material, but what would be useful to someone coming to the project as an outsider. This rules sheet is called a DTD (Document Type Definition), and provides a formal definition not only of what tags are going to appear in our documents, but also specifies where they may appear. It represents a formal definition of some aspects of our conception of the texts themselves. It produces almost an outline of the textual material, and by providing a way to verify that the tag set we use, and the way we use it, matches the constraints we put into the design, it helps us to more readily maintain a high level of consistency between documents.

We mentioned above that we wanted to index materials in the text by person, places and other things. One of the important aspects of our DTD, then, was the provision of tags to allow us to explicitly identify references to these things in the text. In addition, we provided tags that allowed us to associate keywords with paragraphs that categorised their meaning. By providing tags that would make this material explicit, it became possible to provide a search engine that would allow for the searching of texts for references to these things, or to the concepts we had attached.

To produce a DTD, we took the following steps:

  • making a list of all the 'elements' (tags)1 that would be needed
  • listing the different types of content possible for each element
  • outlining the relationship between these elements
  • listing the 'attributes' that went with each 'element'. Attributes describe individual elements, e.g. <name first="Angel" surname="Rama">. Here the element 'name' has two attributes: 'first' and 'surname'. To quote David Gulbransen, "If Elements are Nouns, Attributes are Adjectives"
  • making a list of possible (parameter and character) entities that might be needed
  • checking that the DTD was syntactically valid.

There then followed a process of checking, testing and making appropriate changes. The DTD was revised about 15 times, sometimes only in minor details, at other times undergoing major surgery. By testing the mark-up on sample documents, we were trying to prepare ourselves for any possible structure that might appear in a text document submitted to the project and to see how useful each feature was. The resulting XML documents were used in two ways. First, they were converted into HTML documents for viewing on the web. To do this conversion we developed a XSLT ('Extensible Stylesheet Language Transformations') stylesheet. Then, at the same time, information about persons, places and institutions, and the keyword information that had been associated with the document as a whole and to the individual paragraphs was "harvested" to provide the data for the text-oriented search engine.

Having organized both image and text materials, both of which were represented in very different ways, John Bradley of King's Centre for Computing in the Humanities created on-line search engines to retrieve these, allowing users to combine in their searches the categories employed in classifying images and texts. The image database - originally created in FileMake Pro - has been converted to use a relational online database engine called MySQL. A series of processes have been built (using tools and techniques called CGI) to take queries from users over the World Wide Web and to process them against the database - generating results that again are expressed in HTML so that the WWW user can see them with his/her browser. Similarly, tools have been built to take data out of the XML documents to populate a part of the same MySQL database that supports the document-oriented search engine. Here too, connections using CGI-scripts have been developed to allow WWW users to query the text-derived material in the MySQL database, and to view the results.

There are not as many connections between the text- and image- parts of the materials as we had originally hoped, but there are a few connections. When a viewer clicks on an image in the text, a query goes to the image database to pull up not only the image itself, but more information that has been stored there about it. Similarly, when one clicks on the name of a person, place or thing in the text, a query is generated to the text part of the database to find all the places in other texts where the same person, place or thing is mentioned. If time permits, a further rethinking of our design, plus the work to put it in place, would allow for more sophisticated links between the text and images to be established.

1 Elements and tags are not exactly the same thing, since a tag merely denotes that fact that an element has been opened or closed, but in this essay I shall use fairly loose terminology in order to make the discussion more accessible to a non-technical audience.
Home Library Gallery Tours About Help