Markup policy and TEI tag usage in the electronic edition of Robert Boyle's Work-diaries

By Charles Littleton


1. Introduction

This document is divided into two parts. The first sets out how the editors of the Boyle Work-diaries project would like to see the material we have worked presented on the Web, i.e. the various components there are and their relations. The second part will be a more detailed description of the way in which I have used each of the principal TEI elements in these files. This will discuss its function in the file structure, the formatting we would prefer for it, and its attributes and the way they have been used. This last topic will be quite detailed as the attributes contain much data and formatting instructions in them.

2. Web format and layout of the work-diaries

This is summarized in the section on Web presentation found in the separate Editorial Policies document connected to these files. We will reproduce it here, but will add additional comments for programmers and Web formatters, which are distinguished by being within parentheses and italicised. The section reads:

2.1. Normalized and Diplomatic versions

Readers are offered a 'clean', normalized, and easily readable transcript of an entry which exists parallel to a 'diplomatic' version which includes all the emendations to which the entry was subject. In the normalized version of the work-diary, deletions and alterations to words in the original text have been omitted, unfamiliar abbreviations expanded and Roman numerals expressed as Arabic numbers. Places where there are deletions in the original (marked with <del>) are marked in the normalized text by a small red 'd' in brackets, and places where there are missing letters (marked with <gap>) are marked by an red ellipsis in brackets; other words for which there is expanded textual commentary are also in red typeface (these include text marked with <corr>, <add> with 'hand', 'place' or 'rend' attributes, <unclear> and <supplied>). Similarly, abbreviations and Roman numerals which have been expanded or modernized are marked in green (marked with <abbr> and <num>). These colours indicate that such words or sigla are hypertext links, clicking which will take the reader to the parallel 'diplomatic' version of the text, in which all the textual emendements made to the text are noted. The editorial commentary provided in the diplomatic version is written in italics, within square brackets and in a smaller font than the surrounding authorial text (the data contained in these editorial notes is contained in the attribute values of the relevant elements). Square brackets around words of the size of the surrounding text, in both the normalized and diplomatic versions, indicate that that word is an uncertain reading or is a place where there are missing letters in the original supplied by the editors by context - unless there is a note in the entry's editorial notes indicating that the square brackets are the author's own (marked by <unclear> and <supplied> elements). Unclear or supplied text in red in the noramlized text indicates that the corresponding word in the diplomatic version will give further information on the causes for its illegibility and will state whether the word is merely unlear or is supplied (contained in the 'reason' attribute of the <unclear> and <supplied> elements). The diplomatic and normalized texts are linked both ways and merely clicking on the bracketed and italicised editorial note in the diplomatic text will take the reader back to the same place in the normalized text. Readers should note, though, that links in the diplomatic version are not indicated by different colours, but are in the same black typeface as the surrounding text. (Obviously, it is up to the individual programmer and designer to determine how the sigla indicating deleted or missing words, unclear and supplied readings, expanded abbreviations, etc. are to be represented. We are not wedded to red or green words, etc. However, we do see the linking capability between normalised and diplomatic texts as the key feature of this edition and indeed all the markup policies have been geared primarily for that purpose.)

2.2. Other links and searching aids

We have also included other hypertext links for relevant words within the text - particularly names, places, and books referenced - which are linked to a separate biographical and topographical register (which can also be accessed independently). (These are indicated by the <rs> element with the value of its 'type' attribute indicating whether it is a biographical (type="person"), bibliographical (type="author") or topographical (type="place") reference; all these are to be linked to the biographical register (Register.xml) by means of links. The relevant link is indicated by the value of the 'key' attribute in <rs>, which is the value of the corresponding entry's 'id' value in the register (even though, unfortunately, 'key' is not an attribute of IDREF content)). Boyle was frequently indirect when naming his sources, often just referring to 'An eminent physician whom I know'. Where we have been unable to trace the identities of such mysterious figures, the description in the transcription will not be highlighted. Ocassionally we have been able to ascertain who these people are from knowledge of Boyle's life and contacts or from other clues in the text. In these cases, the description is highlighted and the corresponding note will explain how we have determined the actual named identity for Boyle's circumlocution. (Once again, it is up to the individual programmer to determine how to indicate the links of these <rs> elements. They should, however, be distinguished in some way from the sigla for the diplomatic transcription discussed above. We also ask that hypertext links, although they may be differently coloured, not be underlined, as is usually done on other web pages. This would confuse the reader as to Boyle's own original underlining practices.)

We intend that each entry should also be linked with certain 'keywords' which describe its nature and content, although not necessarily appearing in the entry itself (and therefore failing to produce the entry in a regular word search). We have attached keywords to a few sample entries, but have decided that devloping keywords for these entries is a long-term project, due to the variety of topics covered in these notes and the complexity of Boyle's chemical thought (or opacity of his language). We believe that this could be the sort of joint scholarly endeavour which electronic technology is now faclitating; we would like the site to become a place of scholarly and scientific exchange as readers discuss the significance of these entries and the concepts which are most central to them. We hope, then, that readers will assist us in devloping the keywords for these entries by contacting us with their ideas for pertinent categories and concepts to incorporate; please email Prof. Michael Hunter at m.hunter@bbk.ac.uk with your views and ideas. (This is obviously a long-term project which we have barely begun. We intend to use the <index> element to indicate keywords. It is not something to worry about at the moment, though.)

2.3. Structure of the Web version of the transcription

The work-diary transcription has both an Editorial Introduction and the transcriptions of the entries themselves. The Editorial Introduction provides general information on the work-diary as a whole (all this information is contained in the <TEIheader>, as follows; we do ask that readers have access to it somehow when consulting a work-diary): the title of the work-diary in <title>, in <titleStmt>); a brief description of its content (in <note type="content">); notes on the page format on which its entries are recorded (in <note type="format">); its date of composition (as accurately as that can be ascertained) (in <creation>); the hands which contributed to its composition (with a detailed list of which of the numbered entries are ascribed to them) (in <handList>, with the name of each hand in the 'scribe' attribute of each subsidiary <hand> element, and the entries ascribed to him in the 'character' attribute); the manuscript reference, with Boyle Papers volume and page and folio numbers (in the <bibl> element of <sourceDesc>); the languages used, with details of how many entries are in a particular language (in the <langUsage> element; the name of each language is in the subsidiary <language> elements, and the number of entries ascribed to it in the 'usage' attribute); its length, expressed as the number of entries contained in the work-diary (in <note type="length">); and general editorial notes and commentary on the work-diary, often detailing problems with its interpretation or transcription (in <note type="note">; often there will be many of these, each with its own 'id' values to be referred back to by entries in the work-diaries; on some ocassions I have opted to even make a subsidiary <list> of points under a larger <note type="note">). (The above describes all the data relevant to each work-diary in the <TEIheader>; other data contained there is generally repeated for each work-diary, and contains information on the editorial aspects of the work-diary. In particular, the information in <titleStmt> apart from <title> itself, is not of direct interest to the reader, but should somehow be accessible so that readers can know who funded the project. <extent> and <tagUsage>, if I even decide to include that data, is more for the benefit of programmers.)

The basic unit of each work-diary is the entry, usually a short paragraph (sometimes only a sentence) detailing an experimental account or an anecdote illustrating a natural phenomenon (in the <div type="entry"> element, the basic structural unit of the work-diary files). In this edition, all entries consist of at least two parts: editorial notes (<note resp="editor">)and the text of the entry itself(<p>). In addition there may be two other sections, if their content is present in the original: a list of marginal notes written at the time of the composition of the entry itself, and thus 'integral' to it (<note resp="author"> whose type attribute is marked 'integral'); and a list of retrospective endorsements and notes (see above in the section on Titles and Marginalia) (<note resp="author"> whose type attribute is not marked 'integral'). The editorial notes introduce the entry and include: the number of the entry, if assigned to it by the editors (as opposed to authorial numbers, about which see below) (<note resp="editor" type="number">, every entry has a number assigned to it, whether it is editorial or authorial), general notes on features of the entry (<note resp="editor" type="note">); the hand in which the entry is written (derived from the immediately preceding <handShift/>, which are placed within <p>); an estimate of the date at which the entry was composed, based on evidence of handwriting and dating evidence found elsewhere in the manuscript (derived from either the <note resp="editor" type="date"> found in the entry itself, or in the last preceding <note resp="editor" type="date"; if possible, the date could also be said to be between the last precding <note resp="editor" type="date"> and the next following)>; and the full reference to the entry (including Boyle Papers volume and page or folio number) (derived from the ed (the BP volume) and n (the page or folio number) attributes of the last preceding <pb/> element). The editorial notes are followed by a list of the marginal notes integral to the entry, if there are any. These are listed under the categories 'date' (where the scribe provided the date of composition or performance of the experiment) (<note resp="author" type="integral/date">); 'number' (where he provided a number for the entry) (<note resp="author" type="integral/number">); 'title' (where he provided a brief description of the content of the entry) (<note resp="editor" type="integral/title">); 'reference' (where the bibliographic details of a work referenced are provided) (<note resp="author" type="integral/reference">); and 'note' (for any miscellaneous or stray marginal memoranda that appear to be in the hand and writing medium of the original scribe) (<note resp="editor" type="integral/note">). This section is followed by a list of the retrospective marginal endorsements. Like the integral notes, these include such categories as 'number', 'title', 'date', 'reference' and 'note' (exactly like the authorial integral notes listed above, except without the prefix 'integral/' in the type attribute) and also include the additional categories 'endorsement' (for those marginal notes which indicate with which of Boyle's more formal writings the entry is to be associated) (<note resp="author" type="endorsement">) and 'mark' (for the many ticks, crosses, circles, etc. which appear in the margins) (<note resp="author" type="mark">). Details on the writing media and location of these marginal endorsements are included next to the text of the endorsements in slightly smaller square brackets (but not italicised) (these are in the place, hand and rend attributes to each <note>; the place attribute is always present, but is usually 'margin', although sometimes there is additional information detailing the exact location of the note in the margin; when the hand attribute is not present, it is because we cannot make a positive identification of the hand; when the rend attribute is not present, the default of 'ink' should be assumed).

These three sections are followed by the entry text itself. It is important to set out briefly here the principles that have guided us in our transcriptions of the entry texts. This section will not include a discussion of the rationale behind these policies, which can be found in the sources listed above (i.e. in sources listed in this section of the document that appears in the Editorial Policy document; I did not think it was necessary to list them all here).

2.4. Editorial and transcription policies

2.4.1. Spelling and Punctuation

Spelling and punctuation have been rendered as it appears in the manuscript and have not been normalized. Where this may lead to confusion an editorial annotation has been supplied to provide a 'modernized' form (using the <sic> element, although this is very rare, and we have not decided how it is to be employed; for the time being it should be ignored). We have however kept this to an absolute minimum and note where such incidences occur. Certain letters in the original have been transcribed following modern standards. Thus 'ff' is 'F' and long 's' is the modern 's'. For the interchangeable letters 'u' and 'v', the former letter is rendered when a vowel is used in the modern, printed, form of the word and the latter where a consonant would be used. The same rule applies to the equally interchangeable letters 'i' and 'j'. In the manuscripts, quantities are often expressed by Roman numerals, in which the last in a series of 'i's (i.e. '1') is written as a 'j'. In the transcriptions this final 'j' is also transcribed as 'i'. Ampersands, &, are retained throughout and not modernized, as this is still a commonly recognized symbol.

2.4.2. Textual emendments

We consider here four principal type of textual emendment: insertions (<add>); deletions (<del>); replacements (where an inserted word is apparently replacing a corresponding deleted word) (<rep>, which contains one <add> and one <del> element); and alterations (where a letter of letters of a word are changed in composition) (<corr>). All insertions to the text are retained in the transcription and are surrounded by angled brackets, < and >. If an insertion is in the margin or placed in the line, this is indicated by an italicised editorial note in square brackets immediately following the insertion (in the diplomatic version) (in the place attribute of the <add> element, with value of either 'margin' or 'line'). If there is no editorial note following the insertion (and it is not in red in the normalized version), it should be assumed that the insertion is supralinear, the default status for insertions. Where an insertion replaces a deleted text the insertion is provided within angled brackets and the content of the deleted text given within following italicised editorial brackets (the <rep> element, created specifically for this project; <rep> always has as children one <add> element for the replacing word and one <del> element for the replaced word). Deleted and altered words and passages are recorded and their content or process of alteration explained within following italicised editorial brackets (the content of deleted passages is the content of the <del> element; the description of the alteration which a word has undergone is provided in the sic attribute of the <corr> element, which element surrounds the word in its final, altered form). All the explanatory editorial notes are found in the diplomatic version, with links to and from the normalized version.

2.4.3. Abbreviations

Boyle and his various amanuenses used a host of different abbreviations in their writing. Abbreviations can be categorized under three types: suspension, where the first letters only of a word are provided; superscription, where the first letters are provided and the terminal letter or letters are written in superscript; or brevigraph, where missing letters are indicated by a sign or a mark.

Suspensions are transcribed as they appear in the original text. Occasionally, for more obscure suspensions, the normalized text will contain the expanded form of the word, appearing in green to indiate that there is a link to the original suspension in the diplomatic version (expansions of abbreviations are provided in the expan attribute of <abbr>, which element surround the abbreviated form). However, there are so many of these suspensions, and most of their expansions are so obvious or commonplace, that we have not systematically expanded every single instance, and trust readers will be able to determine most of the expansions themselves. We have not expanded initials used to designate people; instead, where the identity of the person is known, a hypertext link at the initial will take the reader to the corresponding entry in the biographical register (people so are identified are tagged by <rs type="person">, with the key attribute providing the link to the ID reference in the biographical register)>. In one other particular case, we have purposefully abstained from expanding suspensions. The recipes in Latin often contain many suspensions, with no indications as to what the proper inflected endings of these words should be (see in particular work-diary X). We have not supplied the endings, because we would have had to make assumptions as to the grammatical role of the words based on very little contextual evidence. More importantly, contemporaries themselves habitually thought of the ingredients and processes found in these recipes in terms of their common abbreviations. The abbreviated form was thought of as the word itself, and Boyle and his colleagues did not necessarily consider these terse words merely as shorthand alternatives to longer words. Thus we have not attempted to expand the majority of abbreviations in Latin recipes, except for a few specific terms of art, mostly those which are included in the basic lexicons of medical terminology, whose abbreviations are so short, usually comprising a single letter, that it would otherwise be impossible to determine its sense even by context (e.g. 'satis quantum' for s.q., or 'satis vis' for s.v.).

Most superscriptions and brevigraphs have been silently expanded, with no notice being taken in the markup of their original form (occasionally though, during those times when I felt it was important to record everything, I marked such expanded abbreviations with the <expan> element, with a description of the abbreviated form in the abbr attribute, often using entities to represent superscripted letters; for the moment, though, the data in the abbr attribute of <expan> is to be ignored and the content of the element processed as normal. At a future date we may decide what to do with these elements, but as they were tagged very inconsistently, their scholarly value is diminished and I am only maintaining them as a matter of record). The most common types of abbreviations that appear in the work-diaries, and our methods of expanding them are as follows:

Where it cannot be determined what the expanded form of the abbreviation should be it has been transcribed as it appears in the original, complete with superscript.

Where Boyle or the amanuensis combines Arabic numbers and Latin or English superscripts -- e.g. 5th, 2ly, 7ber, 9es, 4er, etc. - the characters are transcribed as they appear in the manuscript, with the superscript maintained (thus the content of the <hi rend="superscript"> elements in the texts should be superscripted, preferably in a slightly smaller font than the surrounding text). If the meaning is not immediately apparent, an expanded form will be supplied in the normalized version, linked to its corresponding place in the diplomatic version (i.e. the third, fourth and fifth examples above would be given in the noramlized versions as 'September', 'nones' and 'quater'; however 5th and 2ly and similar constructions are deemed so common and obvious as not to require expansion) (the expanded versions of such number-superscript forms is of course provided by the expan attribute of the <abbr> element which surrounds the abbreviation).

2.4.4. Symbols

The many chymical symbols Boyle and his amanuenses use in their work present other problems. As the meaning of the symbols beyond the most common (iron, copper, mercury and silver) may not be known to most readers the symbols have been transliterated into English text and their literal definition placed between curly brackets (in the texts the chemical symbols are represented as entity references, the full definitions of which, i.e. the transliterations within curly brackets, are provided in the entity file BoyleEntities.ent. We have expressed them as entity references to aid future flexibility. If we are ever able to develop a complete set of characters for these symbols we may wish to substitute them for the present transliterations). Throughout, the contemporary terms for substances and processes are used, those similar to the terms Boyle uses in his literal descriptions of these substances. Thus the symbol C.C. is transliterated as {hartshorn}, the term Boyle commonly uses in other cases, and not 'ammonia'; AF is {aqua fortis} and not 'nitric acid', and so on.

Where these symbols appear with terminal superscripts, the curly bracketed transliteration has been maintained, to signify that there is a symbol at this place, followed by the terminal superscript. This may not make immediate sense to the reader, as, for instance, when Boyle's intended meaning of ♂ial' (i.e. 'martial') would be thus rendered as '{iron}ial'. Even more troublesome are the instances where the symbols are used in Latin texts with terminal superscripts providing the proper Latin inflected ending. Thus ♂is (i.e. 'martis') would be rendered as '{iron}is'. In such cases we supply an expanded from of the symbol-superscript construction in the normalized text, which is of course linked to the original form in the diplomatic version (as usual, this expanded form for the normalized version is provided in the expan attribute of the <abbr> element which contains the symbol-superscript construction; the original construction, complete with raised and smaller superscripted letters, is to go in the diplomatic version). We do consider it important, though, to use the transliterations as place-holders to signify the location of symbols in the diplomatic texts, even if that renders reading them more difficult. Eventually we hope to have access to a complete character set of these symbols whereby we can actually represent them electronically, as well as provide a literal rendering.

Symbols are also used to signify the common measures in recipes - pounds, ounces, drachms and scruples. In the original text these symbols are usually written before the quantity itself, which is usually written as small Roman numerals joined in a cursive style to the unit symbol. The pound, ounce, drachm and scruple symbols are similarly transliterated and the quantity maintained in its position after the measure and written in Roman numerals. Thus the transcription {ounce} iii signifies what would be in modern writing '3 ounces'. In the normalized version the roman numeral in the original is replaced by its Arabic equivalent (the Arabic equivalent is given in the value attribute of the <num> element which contains the Roman numeral; we have not yet decided definitively whether we want to provide readers with the Arabic numbers or whether we expect them to read the Roman numerals in both the diplomatic and normalized versions), but the transliterated unit of measurement still appears before the quantity. Furthermore, there is also an old sign for ½ that often appears with these measurements - ß, or two long 's's, presumably standing for 'semi'. This will be rendered as ; in the transcriptions.

The marks - slashes, crosses, lines, stars/asterisks, etc. - that often stand next to entries are also considered 'symbols' that are expressed by transliterations within curly brackets. We have taken this approach rather than try to express them typographically from the existing character sets (see BoyleEntities.ent for a list of these entity references and their definitions). Finally, the symbol for recipe, ℞, is here represented as {Rx}. Even though the symbol does exist in the chracter sets, its typeface did not seem to mesh very well with the remainder of the text, so we have opted for another straightforward transliteration.

3. Element List

The following provides a list of the elements most commonly used in the <text> section of the files, with a description of how they are used, their attributes and range of attribute values, and their preferred formatting. Much of this information has already been covered in the commentary to the editorial notes above. Please note that this list does not discuss elements in the <teiHeader>

<abbr>

An abbreviation which we want to display as it stands in the diplomatic version while providing an expansion of it in the normalised version. The element itself contains the abbreviated form, usually incorporating a superscript (<hi rend="superscript">). Also used to provide glosses on forms consisting of combinations of characters (chemical and number symbols) and letters, often in Latin.

<add>

A word or words inserted into the text during Boyle's lifetime (i.e. not editorial annotations by later editors such as Wotton or Miles).

<corr>

Words altered during composition of the text (as distinct from words completely struck out and replaced, which are dealt with using <del> and <add> tags, contained within the parent <rep> element). Not only does this tag indicate words whose individual letter or letters are altered, usually by overwriting, but it also gives details of individual letters inserted or deleted within words. We have taken this route rather than placing <add> and <del> tags within words, whose formatting could be disconcerting to the reader.

<del>

Words or words deleted from the text. The content of the deleted passage is to be included with the diplomatic version.

<div>

Self-contained unit of the work-diary, such as sections and entries

<expan>

The expanded form of an abbreviation. Used here largely as a matter of record, as all the abbreviations so tagged would be provided in their expanded form in both the diplomatic and normalised versions, according to our editorial policies on abbreviations.

<gap/>

Material not transcribed (because illegible or lost).

<handShift/>

Change of scribe/ amanuensis in manuscript, signalled by change of handwriting. This tag is used for 'large-scale' changes in hand, i.e. where the new scribe writes at least one complete entry. Brief alterations in hand within entries are considered additions to the text and are marked by <add hand=""> (see above). the <handShift/> appears immediately after the <p> of the first entry associated with the scribe, although the same scribe is probably associated with the integral notes which may also appear in the entry and actually appear before the <p> in our arrangement of data. For consistency's sake, though, we have always associated <handShift/> with the entry's <p>

<head>

Heading of a work-diary, sub-section within a work-diary, or individual entry. The formatting of <head> depends upon within which parent element it appears. A <head> immediately within <div type="section"> is a work-diary heading (or sub-section heading) and should be relatively large and prominent. Within a <div type="entry">, the <head> only refers to that individual entry. It should be formatted separately and differently from the entry text (i.e. usually centred), but should not be prominently larger than the entry text

<hi>

Text to be rendered as typographically distinct in some way (other than by deletion).

<l>

Line of verse. Used very rarely as there are few examples of verse in the work-diaries. <l>s should be single-spaced and made distinct in spacing from <p>.

<lb/>

Line break in headings and prose sections.

<lg>

Line group, i.e. a verse passage or a stanza within a verse passage. Used very rarely as there are few examples of verse in the work-diaries.

<note>

Probably the most frequently and variously used element in the work-diaries, with a multitude of functions. Its uses are best described through a discussion of its many attributes

<p>

Paragraph. The element encloses the text of the entry found in the body of the page. Most entries are only one <p> long, though some have several. A child element of <div type="entry"> and sibling of <note> for each entry.

<pb/>

Page break. Its attributes to be used to generate page references for each entry

<ref>

Reference to another text within the document; used to generate links between sections of text within the same file. See <rs> below for those elements which are linked to text in a separate file

<rep>

Replacement of deleted word. This element was included in the DTD specifically for this project. A <rep> attribute consists only of one <add> element and one <del> element as children. It indicates that the content of the <add> replaces the content of the <del>, usually in the form of a words interlineated above struck-through words. In its editorial apparatus this project is keen to distinguish between replaced (and replacing) words and words merely inserted or deleted (that is to say, not every <add> that appears next to a <del> is necessarily replacing it and a distinction needs to be made). Our own stylesheets format the content of <del> within <rep> differently from <del> appearing by itself, using the form, coming immediately after the appearance of the inserted words, [replacing ' ' deleted] (whereas the content of regular <del> is just noted as '' ' deleted]). We ask that programmers incorporate a similar distinction between types of deletions and insertions when devising stylesheets

<rs>

Referring string. Text marked for indexing and/or linking purposes. Used for links to text in a file separate from the file in which the <rs> appears. Much of its usefulness derives from its attributes

<sic>

Faulty text: contains the text as it appears on the manuscript. Used very rarely in these files. NO formatting is envisaged for it at the moment.

<space/>

Gap in the text attributable to author or scribe (i.e. not a gap due to illegibility, about which see <gap/>).

<supplied>

Letters or words supplied by the editor where there is damaged text but sense can still be determined by context. Should appear within square brackets, or with the same marking as <unclear> or <gap>

<unclear>

Uncertain or conjectural reading supplied by the editor. Should appear within square brackets, or with the same marking as <supplied> and <gap>