
Digital Medievalist 1.1 (Spring 2005). ISSN: 1715-0736.
© Jonathan Green, 2005.
Creative Commons Attribution-NonCommercial licence, 2.5
[ Skip to Abstract | Return to Top ]
Peer-Reviewed Project Report[ Skip to Navigation | Return to Colophon ]
The Illustrated Incunable Short Title Catalog on CD-ROM (IISTC), now in its second edition, provides an unrivaled wealth of information on fifteenth-century printing and, as a computer database, allows for rapid searching that would not be possible with printed reference works. However, the database's search interface suffers from numerous problems, as Paul Needham described in a thorough review essay. This article presents a solution to those problems that can be implemented by the end user, and also shows what kind of useful information can be obtained from the IISTC by doing so. The solution entails exporting all records to a very large text file, analyzing the file with scripts written in Perl, importing the information into a full-featured database application, and conducting queries with the database application's more robust and better documented interface. With the IISTC data directly accessible, the database fields can be manipulated to implement features missing in the original IISTC, including separate fields for each part of the imprint data and a count of recorded copies. Query-generated output demonstrated here include a table of incunables with the highest number of copies recorded in the IISTC; printers of Ulm, the number of their signed editions, and their dates; and the number of signed editions printed each year through the end of the fifteenth century. Sample scripts for recreating the results described here, as well as instructions for implementing them and a discussion of points to consider when doing so, are found in the appendices.
Keywords: Illustrated Incunable Short Title Catalog (IISTC); bibliography; databases; user interfaces; scripting languages (PERL); incunabula; British library.
[ Skip to Content | Return to Abstract ]
§ 1 Compliance with open standards in software and multimedia projects is an excellent thing for the projects' users, and so it is often promoted as a virtue that programmers and digital content creators should strive for. Developers have not always shared this concern, unfortunately, with the result that the users who expected to make use of some electronic resource in their research are occasionally prevented from finding all the answers they had sought. What is the end user to do in such a situation, besides write to the publisher and ask that the needed feature be added in the next version? Sometimes the user can do more, perhaps much more, depending on the program or project. In the case of one invaluable database, the Illustrated Incunable Short Title Catalog on CD-ROM (IISTC) (British Library 1998), there is a wealth of useful information trapped behind an inadequate user interface. Medievalists working on fifteenth-century literature or early printing have two options: they can practice the patience of a recusant or they can seek a radical solution, namely, exporting all 28,360 records, extracting the necessary information, and importing it into the fields of a standards-compliant database. The questions about early printing that can then be more easily answered illustrate one reason why standards compliance is so important for humanities computing projects: designers and developers can never anticipate all the research inquiries that scholars may wish to pursue.
§ 2 As a catalog of all known
incunable editions with an extensive if not yet complete
list of known copies, the IISTC comes closer than any other
presently available reference work to being a worldwide
incunable census. It is therefore an essential tool for
research libraries and scholars in many fields. As a
computer database rather than a printed catalog, the IISTC
promises quick answers to scholars' questions. In a
thorough review article, Paul Needham praises the IISTC as a
milestone in the history of incunable bibliography but also
identifies numerous deficiencies in the database and
particularly in the idiosyncratic user interface (Needham
1999). The application of computer technology to
bibliography makes the IISTC a revolutionary
publication in incunable study
, by allowing
searches that from printed reference works...can be
made only laboriously, or for practical purposes cannot be
made at all
(479). And yet, because of the
limitations of the software provided for searching the IISTC
database, several types of inquiry remain laborious,
impractical, or at times impossible. Needham specifically
mentions the following shortcomings:
§ 3 In addition, although the
still-incomplete IISTC is unmatched as a worldwide incunable
census by any other resource including the
Gesamtkatalog der Wiedgendrucke
, the IISTC lists the present-day locations of
incunable editions, but does not display the total number of
copies. This is not a serious problem if one can see at a
glance that there are only one or two copies of a given
incunable, but rather irksome if there are dozens, and a
huge handicap if one wishes to compare the number of
recorded copies for more than a few editions. To answer the
question, For which incunable does the IISTC record
the largest census?
it is best to know the answer
beforehand: Anton Koberger's edition of the Latin
Nuremberg Chronicle, 12 July 1493
(Goff S-307), and even then one must calculate the total
number of copies by hand (Needham arrives at more
than 780
Needham
1999, 497]). Even an experienced scholar of
fifteenth-century printing might have difficulty naming the
second-, third-, or tenth-largest incunable census, or the
largest census for works printed in German or another of the
vernacular languages. This information lies within the
database, but the IISTC interface prevents users from
accessing it.
§ 4 Repetitive tasks, such as adding up the number of copies for 28,360 editions, are best left to computers, and therein lies the solution, which has but four essential steps:
While many different approaches and various software packages could be used to implement these steps, the following discussion is based on readily-available software and consumer applications that are practically standard issue in the computing infrastructure of many colleges and universities. (For a full account of the process of importing the IISTC records into a database, see appendix 1.)
§ 5 Because the IISTC allows any number of records to be selected and exported, a user could choose to export all 28,360 records to a plain-text file, at least in theory. The operation itself can take hours or even days, perhaps as a legacy of the IISTC's providing only a 16-bit Windows interface coded in Visual Basic 3. The minimal requirements of the IISTC software mean that it can run on quite antiquated hardware, although at the cost of increased liability to crash and increasingly uncertain interoperability with a library's newer computer infrastructure. Exporting all the records is nevertheless possible and, as the following will show, quite useful.
§ 6 The export of every IISTC record results in a very long list of records such as the following:
The Illustrated ISTC (2nd Edition)
Author: Aesopus
Title: Vita, after Rinucius, et Fabulae, Lib. I-IV, prose version of Romulus [German]. Add: Fabulae extravagantes. Fabulae novae (Tr: Rinucius). Fabulae Aviani. Fabulae collectae [German] (Tr: Heinrich Steinhöwel). Leonardus Brunus Aretinus: De duobus amantibus Guiscardo et Sigismunda [German] (Tr: Nikolaus von Wyle)
Imprint: [Augsburg: Anton Sorg, about 1479]
Language: German Format: f°
Notes: General+Production:
Woodcuts
Cataloguing Source: Goff A120
Bibliography: HR, Supplement 333; Schreiber, Manuel 3028a; Schramm IV p. 50; GW 353
Locations:
British Isles: London, Victoria and Albert Museum
USA: LC(R); MMu(P)L
Germany: Dresden KupferstichKab
ISTC No: ia00120000
(c) British Library Board and (c) Primary Source Media
One could cut and paste all of the information by hand from this record to a database or spreadsheet table, where Author was one column, Title another, and so on. It would be arduous and repetitious, and therefore best left to a computer. Fortunately, there are well-documented and accessible scripting languages such as Perl exactly suited for this task. (For Perl software and documentation, see <http://www.cpan.org>; <http://www.perl.org>; <http://www.activestate.com>.) One must only tell the computer:
Read through all 495,923 lines in the exported text file; whenever you find a line that begins with Title:, save everything between the non-printing tab character and the end of the line; now look for a line that begins with ISTC No:, and do the same. Finally, print the ISTC number as an index, then a tab character to separate the fields, then the title, and then a new line character. And then get back to work!
The script, in Perl as written by a medievalist (and explained more fully in appendix 2), might look something like this:
$batch="istc.txt";
open BATCH, $batch or die "Cannot open $batch for read:$!";
while (<BATCH>) {
if (/^Title:\t(.*?)$/) {
$match = $1;
$hit=1;
}
if (/^ISTC.*(i.\d{8})/ and ($hit == 1)) {
$hit = 0;
$istc_number = $1;
print "$istc_number\t$match\n";
}
}
That is, if the entirety of the IISTC is exported as plain text to the file istc.txt and the Perl script invoked as written, it will produce a very long list that begins in the following way:
ia00000500 Orhot Hayyim
ia00001000 Abbey of the Holy Ghost
ia00001500 Abbey of the Holy Ghost
ia00002000 Abbey of the Holy Ghost
ia00003000 Abbreviamentum statutorum
ia00004000 Abbreviamentum statutorum
ia00004500 Abbreviamentum statutorum
ia00005000 Abbreviamentum statutorum
ia00005500 Abecedarium
ia00008000 Dialogus in astrologiae defensionem cum vaticinio a diluvio ad annos 1702. With additions by Domicus Palladius Soranus
ia00009000 Trutina rerum coelestium et terrestrium. With additions by Augustinus Beganus and Ludovicus Ponticus
ia00009100 De luminaribus et diebus criticis
If the output is redirected to a file, then one is left at the end with a tab-delimited table containing a list of ISTC index numbers and their corresponding titles, which can be imported into the database application of one's choice. With similar scripts that search not for Title: but for Author: or Imprint:, for example, the rest of the information can be extracted as well and then imported in turn. While the IISTC search interface is idiosyncratic, inadequately documented, and crash-prone, the database industry has spent decades and billions of dollars on standardizing, documenting, and crash-proofing their software.
§ 7 While using another software
package to replace the IISTC interface is useful, any
spreadsheet or database application will have its own
limitations on what it can do with the records in their
present form. Opening up the IISTC has the added advantage,
however, that the records can be manipulated further. For
example, the Imprint field could be split up
into city, printer, and
date fields; or a flag could be added to mark
each as signed
or
unsigned
; or fields could be created
for the first, last, or average of all dates attributed to
unsigned imprints. The pattern matching and string
manipulation capabilities of Perl are quite robust and can
even be made to deal with defective IISTC records. (For one
possible implementation of a script to analyze the
Imprint field, see appendix 3.) The same kind of manipulation can be
done on the Locations field to provide a count
of the number of copies identified for each of ten
geographic regions, which can in turn be added to yield an
overall sum. A discussion of the particular challenges here
and sample scripts are provided in appendix 4.
§ 8 Is the effort worth it? While learning enough Perl to write the necessary scripts takes some time, it is much more manageable than learning, say, Latin. Whether that is time well spent depends on one's needs, and how much one prefers to let a computer handle repetitive search and tabulation. As noted above, Needham regrets that there is no way to quickly view a list of recorded print shops for a given city using the IISTC (Needham 1999, 497 n. 58), even though the IISTC holds this information. With a database application as an interface, however, one can quickly extract the required information. Thus we can discover that the IISTC records the following printers for Ulm:
Conrad Dinckmut
Conrad Dinckmut?
Hans Hauser
Johann Reger
Johann Reger, for Justus de Albano
Johann Schäffler
Johann Zainer
Johann Zainer, not before 1478
Johann Zainer?
Lienhart Holle
This example, like the others here, was created using
Microsoft Access. This software is neither a model of
standards compliance nor inexpensive; it is, however, the
most widespread of desktop database applications. With its
query by design
functionality, one
can graphically select the printers and
cities database fields, specify that the
latter should correspond to Ulm, and let Access
automate the process of generating the correct query
statement; the process requires less than a minute to set in
motion and just seconds to execute. One can just as easily
formulate a more exact question, for example, For
what Ulm printers do we have incunable editions with
signed city, printer, and year? How many are there? When
were they printed?
By pencil-and-paper methods,
the following table would take quite some time to construct,
but with a database application just a few minutes or, with
experience, seconds:
Table 1: Printers of Ulm and their signed imprints
| Printer | Signed editions | First signed year | Last signed year |
| Conrad Dinckmut | 31 | 1482 | 1496 |
| Johann Reger | 10 | 1486 | 1499 |
| Johann Reger, for Justus de Albano | 1 | 1486 | 1486 |
| Johann Schäffler | 8 | 1492 | 1499 |
| Johann Zainer | 35 | 1473 | 1500 |
| Lienhart Holle | 6 | 1482 | 1484 |
As the IISTC records relatively few imprints after 1501, the last signed year does not, of course, indicate that a printer ceased operation around that time. The SQL statement used for the search may seem complicated at first glance, but one does not have to glance at it even a first time, thanks to the query design system of Microsoft Access and other consumer database applications:
SELECT istc.Printer, istc.City, Min(istc.first_year) AS MinOffirst_year, Max(istc.last_year) AS MaxOflast_year, Count(istc.istc_number) AS CountOfistc_number
FROM istc
WHERE (((istc.Flags) Like "+++"))
GROUP BY istc.Printer, istc.City
HAVING (((istc.City)="ulm"));
(Note that the database name is istc, and the relevant fields are Printer, City, first_year, last_year, istc_number, and Flags, which signify whether the city, printer, and date are signed or attributed.)
§ 9 The preceding table should not be confused with an authoritative statement based on extensive research. It is rather a quickly-constructed summary that provides a first impression of the overall situation of early printing in Ulm, but that is by itself a useful function for a computer database.
§ 10 What if one wanted to see a rough overview of the development in number of editions printed each year? (See, for example, Neddermeyer 1998, 2:609-10.) As noted above, the IISTC Year of Publication field is entirely inadequate for this, and the IISTC does not permit searching of editions with signed dates only. After the IISTC records have been imported into a database, one possibility would be to take the average of dates that appear as [1479-81], or one might choose instead to consider only the 12,072 imprints with a signed date. If one takes the latter option, one can quickly paste the resulting data into Microsoft Excel—another omnipresent if not inherently standards-friendly spreadsheet application—to construct a graph such as the following:
Needham notes that a search of IISTC's Year of Publication field would find a seeming contraction in book printing between 1477 and 1479, but that this reflects idiosyncrasies in the IISTC search software rather than an actual shrinkage in production (Needham 1999, 489). The summary of incunable production for the years 1475-1485 (below) finds that this apparent contraction was indeed spurious-but perhaps not that for 1482 through 1484, when signed editions decline by 18% over two years (the only decline lasting more than a single year). Additional work is required to determine how widespread this phenomenon was or what its causes might have been (see also Neddermeyer 1998, 1:420-22), but the graph at least provides the right place to start, where the numbers provided by the IISTC search interface do not.
Table 2: Editions per year, total (IISTC), and signed only
| Year | Editions (IISTC) | Editions (signed only) |
| 1475 | 835 | 242 |
| 1476 | 589 | 231 |
| 1477 | 672 | 257 |
| 1478 | 657 | 266 |
| 1479 | 563 | 245 |
| 1480 | 1177 | 285 |
| 1481 | 734 | 342 |
| 1482 | 816 | 359 |
| 1483 | 932 | 334 |
| 1484 | 717 | 295 |
| 1485 | 1118 | 312 |
§ 11 And what of the Nuremberg Chronicle? It stands at the head of the list of most-preserved incunables, but what follows it? According to the IISTC, the Nuremberg Chronicle vastly outnumbers its closest competitor:
Table 3: Incunables with highest census counts in the IISTC
| Author | Abbreviated title | Reference | Imprint | Copies |
| Schedel, Hartmann | Liber chronicarum | HC 14508* | Nuremberg: Anton Koberger, 12 July 1493 | 786 |
| Aristoteles | Opera [Greek]... | HC 1657* | Venice: Aldus Manutius, Romanus, 1495-98 | 319 |
| Biblia latina... | HC 3173* | [Strassburg: Adolf Rusch, for Anton Koberger at Nuremberg, not after 1480] | 287 | |
| Politianus, Angelus | Opera... | HC 13218* | Venice: Aldus Manutius, Romanus, July 1498 | 270 |
| Euclides | Elementa geometriae... | HC 6693* | Venice: Erhard Ratdolt, 25 May 1482 | 266 |
| Epistolae diversorum philosophorum...[Greek] | HC 6659* | Venice: Aldus Manutius, Romanus, 1499 | 266 | |
| Firmicus Maternus, Julius | Mathesis (De nativitatibus libri VIII)... | HC 14559* | Venice: Aldus Manutius, Romanus, June and [17] Oct. 1499 | 257 |
| Ubertinus de Casali | Arbor vitae crucifixae Jesu Christi | HC 4551* | Venice: Andreas de Bonetis, 12 Mar. 1485 | 252 |
| Boethius | Opera | H 3351* | Venice: Johannes and Gregorius de Gregoriis, de Forlivio, 1491-92 | 251 |
| Antoninus Florentinus | Summa theologica (Partes I-IV)... | HC 1243* | Venice: Nicolaus Jenson, 1477-80 | 242 |
These numbers should not be understood as the number of
copies now existing, nor even as the number of copies
recorded by the IISTC. Rather, one has to interpret them as
one computer script's interpretation of the IISTC
data, which is itself incomplete and sometimes ambiguous.
Martin Davies, ISTC general editor, has stated, however,
that ISTC data collected as of 1992 would proportionally
reflect the total number of surviving copies: The
numbers of copies of any particular edition...must,
however, bear a fairly constant relation to the total now
extant: the fewer recorded the scarcer an edition will
prove to be
(Davies and Goldfinch 1992, 20). If exact precision
is essential, then one would do well to verify all figures
by hand. By my calculation, the IISTC records over 350,000
individual copies for fifteenth-century editions, or between
65% and 80% of all surviving incunables by various estimates
(see Neddermeyer 1998, 1:79). Based on these provisional
figures, one would expect a complete census of surviving
copies of the Latin Nuremberg
Chronicle to have somewhere between 1000 and 1250
copies. Christoph Reske arrived at ca. 900 copies with
another estimated 135 in private hands (Reske 2000,
CD 275-77), while Paul Needham's ongoing count,
largely restricted to copies in public libraries, already
approaches 1200 (personal correspondence, cited by
permission).
§ 12 There are countless ways to graph, chart, and tabulate the IISTC data, but those that occur to this author may not be the same ones that would hold the interest of the present reader. The previous examples should be enough to demonstrate the utility of allowing other software applications to cooperate with the IISTC's data. Standard database searches will address most of the shortcomings of the IISTC identified by Needham (for example, escaping the asterisk character so that it is not interpreted as a wildcard in searches). The rest can be addressed to the extent the underlying data allow by some additional script writing. If the effort required is justified, the tools at one's disposal are flexible enough to provide an answer.
§ 13 The preceding discussion may hold broader implications for designers and users of other electronic reference works. The general outlines of the solution offered here may be applicable to other electronic resources: exporting records, manipulating them, and re-importing them into another application is by no means a unique process. Even if the software in question has no export function, more sophisticated programming can always automate manual copying and pasting.
§ 14 An important point of application design is that a tabular view of a database often permits important phenomena to be more easily visualized and defective records to be more easily found. Providing unimpeded access to the data offers maximum flexibility and value for an application's users. Database fields that cannot be directly viewed and whose reliability cannot be easily verified, such as the IISTC's Year of Publication field, are necessarily less useful than they otherwise could be.
§ 15 While a search interface can have many uses, it cannot anticipate every question that might be asked, and so it can aid or supplement but never entirely replace access to the underlying data. Much of the effort required to make the IISTC data accessible to other software applications would not have been necessary if the IISTC had maintained consistent formatting and made use of an open format from the beginning. That it did not, however, does not mean that scholars and other end users have to wait for the British Library to redesign its project. If necessary, standards can also be imposed from below.
§ 16 The following discussion assumes that the user has the IISTC, Microsoft Windows, and Microsoft Office installed on his or her computer. While the IISTC runs only under Windows, similar results should be achievable with any database software.
Select all records in the IISTC. First, enter a search on the Search screen that returns all 28,360 records, such as searching for i* in the ISTC Number field. On the List Display screen, click on Select All. Once this choice has been confirmed, the computer may become unusable for an hour or more until the operation has completed.
After all records have been selected, click on Export, which is also on the List Display screen. This may also require considerable time before the dialogue box appears. Do not change the export range, but do change the export format to plain text by clicking on the button marked Using Rich Text Format (RTF); once you click on it, the title will change to read Using Plain Text (TXT), which is the desired format. Click on the button marked Export, then select a location for the exported file and give it a name. The examples here assume a filename of istc.txt. The export process may take many hours and tie up all the resources of the computer during that time. The resulting file will be over 22 megabytes in size.
The various fields in the IISTC can now be turned into tab-delimited tables one at a time using Perl scripts such as that found in appendix 2 and above. If the script is named title.pl, the output can be redirected to create a file named title.txt:
perl title.pl > title.txt
Otherwise the output will appear on screen. Scripts virtually identical to that found in appendix 2 can be used to create a series of files, each a tab-delimited table containing an ISTC number in one column and one additional field in the other.
These tables can now be imported one at a time into a database application. Using Microsoft Access, create a new database file, then open the text files one at a time using the File > Get External Data > Import function. Specify that the file is delimited, and that tabs separate the fields, and that it should be opened in a new table. Import the first field as indexed (no duplicates) and give it an appropriate title, so that the ISTC number can serve as the index of the imported database as well; the next stage of the import process will let you choose the ISTC number as the primary key for the database.
Because some title records are longer than the 255-character limit that Access imposes on text fields, these records will be truncated and an error message will appear. Import the titles as a second table in the same way, but with the title field as a memo data type. The truncated titles can be used when sorting is necessary, while the full memo field will be available when the complete titles are needed, so both are useful.
Import the rest of the text files in the same way, choosing appropriate titles for each table and an appropriate data type for each field. Maintain consistency in the naming of fields.
The tables can now be joined one at a time into one large flat-file database table, which can simplify later searches. Select the title table, because it contains all 28,360 records, and, for example, the authors table, which only contains 20,933. Create a query by selecting these two tables; the identical ISTC Number fields are already automatically joined. Right click on the link between the two tables, examine the join properties, and click on the third option: we need all the records from the title table as well as the corresponding records in author. Under the Query menu, select Make-Table Query and choose a name, such as istc2. Running the query will create a new table that includes all the ISTC numbers and titles for all records as well as the author for works that have one. Repeating this process with the newly created table and the table containing the next field to be imported will eventually result in a large table with all of the fields readily accessible in the ISTC database. The table should have 28,360 records after every step.
Some of the most useful information in the IISTC requires further analysis of its fields using more Perl scripts. A script to analyze the imprint field is found in appendix 3, while a script to provide a copy count is found in appendix 4. Each of these Perl scripts will create new tab-delimited tables that can be imported into the database by following steps 4 and 5 above.
Because the IISTC's Printing Regions function assigns some incunables to more than one region, this information cannot be imported into the same table. If this information is required, the relevant records would have to be exported from the IISTC separately, the ISTC Numbers extracted, and a new table created that does not use the ISTC Number as an index. One can then limit one's searches according to the IISTC's printing regions by searching out only those records in the larger table for which an ISTC number in the regions table is associated with the desired region.
§ 17 This is by no means the only possible approach towards creating a database from the IISTC records, or even one that is particularly faithful to the ideal of standards compliance. While Microsoft Access has a large install base, it is quite expensive; open-source and standards-compliant database solutions such as MySQL exist, but none yet matches the ease of use of Access. A monolithic flat file may not be the best database for all circumstances. In addition, some questions are still best handled by recourse to further script writing, particularly if further text manipulation is required, such as numerically sorting entries in the bibliographic standard works.
§ 18 The comment lines, which begin with the pound sign, explain the function of each line of code.
$batch="istc.txt";
# Define the name of file to search
open BATCH, $batch or die "Cannot open $batch for read:$!";
# Open the file, or close with an
# error if it doesn't exist
while (<BATCH>) {
# As long as there are lines
# in the file left to search...
if (/^Title:\t(.*?)$/) {
# ...look for the pattern
# "Title:<tab character><anything
# else>
# at the beginning of a line
$match = $1;
# Save "anything else"...
$hit=1;
# ...and set a flag that
# we've found what we're
# looking for
}
if (/^ISTC.*(i.\d{8})/ and ($hit == 1)) {
# Now, if we have a match already
# saved, look for the
# pattern "ISTC" at
# the beginning of the
# line, and then anything
# else, and then "i" followed
# by eight digits; save the
# "i" and the digits, as that's
# the ISTC number
$hit = 0;
# Reset our flag
$istc_number = $1;
# Assign the "i plus eight digits"
# to a variable
print "$istc_number\t$match\n";
# Print the ISTC number, a tab
# character, the title, and a
# new line character
}
}
The output, as explained above (§ 6), begins like this:
ia00000500 Orhot Hayyim
ia00001000 Abbey of the Holy Ghost
ia00001500 Abbey of the Holy Ghost
ia00002000 Abbey of the Holy Ghost
This output should be redirected to a file to be saved for further use like this:
perl title.pl > title.txt
Very little needs to be changed in order to extract the rest of the fields. In the line if (/^Title:\t(.*?)$/) {, one need only replace Title with Author, Bibliography, Cataloguing Source, Collective Title, Format, Imprint, Language, Locations, or Notes.
§ 19 On many occasions, it would be useful to turn the IISTC imprint field into separate fields for city, printer, and date of printing, and to clearly distinguish between signed and attributed information. The following script accomplishes this based on the output from the script in appendix 2 as applied to the Imprint field.
# This script takes as input a
# tab-delimited table of istc
# numbers and imprint fields,
# assumed here to be named 'imprint.txt'.
# This script outputs the istc number
# again as an index, followed by the
# first imprint field only, then fields
# containing the city and printer. Then
# it outputs the years: the average
# of all years in all imprint fields,
# the earliest and then the latest such
# year. The last column contains three
# flags, either + or -. Signed cities,
# printers, and dates appear as +++,
# while the opposite would be ---. Years
# appearing in single quotes ('1401')
# have been ignored.
# set imprint data file
$batch="imprint.txt";
# open the file to process, or give an error
# code
open BATCH, $batch or die "Cannot open $batch for read:$!";
# create column titles
"istc_number\timprint\tcity\tprinter\tavg_year\tfirst_year\tlast_year\tflags\n";
while (<BATCH>) {
# first, reset all variables
undef @allyears;
undef @sort;
$firstyear=0;
$lastyear=0;
$avgyear = 0;
$yearcount = 0;
$flags='+++';
# save the input line as $record for later
# use
$record=$_;
# get first two tab-delimited fields, the
# ISTC Number and first imprint line
/^(.*?)\t(.*?)\t/;
$istc_number=$1;
$imprint=$2;
# search the imprint line for an optional
# opening bracket, then the city, then a
# colon, then the rest of the line
$imprint=~/^(\[|)(.*?)(?:\]: |: )(.*)$/;
$rightpart=$3;
$city=$2;
# if an opening bracket was found, flag
#the city as unsigned
if ($1) {substr $flags, 0, 1, "-"}
# split the rest of the line by commas,
# forming the array @printer
@printer=split /, /, $rightpart;
# fix 3 defective records: if there's
# no comma found in the rest
# of the line, and there's no number
# to be found, add a dummy,
# empty date element to array
# fix defective imprint lines not
# handled correctly: ip01005630
# (no year,), ic00216715 (no year,), ir00334450
if ($#printer==0 and $printer[0]!~/\d/) {push @printer, " "}
# fix for two defective records with no
# imprint data: print the
# istc number and then skip the rest of the loop
if ($record=~/^([^\t]*?)\t$/) {
$istc_number=$1;
print "$istc_number\n";
next;
}
# remove the last element of @printer array;
# it's usually the date field
$date = pop @printer;
# fix for two deficient records containing neither
# city nor printer, just dates
if ($imprint !~/:/) {
$date = $imprint;
undef @printer;
$city = "";
}
# remove all brackets to test for a date; we
# need to find the ca. 150 records of the
# anomalous form 'City: printer, year, month
# and day'
$_ = $date;
s/[\[\]]//g;
$xdate=$_;
# remove all brackets from current last element
# of @printer array
$ydate=@printer[-1];
$ydate=~s/[\[\]]//g;
# if $date doesn't contain a year, then check
# the last element of @printer; if it does,
# pop it onto the front of $date
if ($xdate !~/1[45]\d{2}|undated/i and $ydate=~/1[45]\d{2}/) {
$date=pop(@printer).$date;
$_ = $date;
s/[\[\]]//g;
$xdate=$_;
}
# now obliterate dates in single quotes regarded
# as false
$xdate=~s/'.*?'//g;
# match a year 1400 to 1599
$xdate=~/(1[45]\d{2})/;
# if we find it, use it, otherwise we have nothing
# to test
if ($1) {$testyear=$1} else {$testyear=""}
# if we have a date to test, get the last two digits
if ($testyear) {$yeardigits=substr $testyear, 2, 2} else {$yeardigits='####'}
# if the last two digits are surrounded by brackets,
# flag the date as unsigned. [14]94 is treated
# as signed, 14[9]4 as unsigned
$_ = $imprint;
if (/\[[^\]]*$yeardigits[^\]]*\]|\[$yeardigits|$yeardigits\]/ or $yeardigits eq '####') {
substr $flags, 2, 1, "-";
}
# split the input line again on the tabs
@checkdates = split /\t/, $record;
# but discard the first two tabs
$null=shift @checkdates;
$null=shift @checkdates;
# and add the date field previously identified
unshift (@checkdates, $date);
# this next loop extracts all years from each
# imprint field in turn
foreach $possibledate (@checkdates) {
$_=$possibledate;
# remove brackets, get rid of '1401' dates
s/\[|\]|'.*?'//g;
# find simple years, like 1493, 1494-,
# 1498-1505
@simple_years=/(1[45]\d{2})/g;
# add the years found to the list
push (@allyears, @simple_years);
# find dates like 1476-80
$_=$possibledate;
@complex_years=/(1[45]\d{2}[\-\/]\d{2})\D/g;
# first count the simple years in the next loop
foreach $simpleyear(@simple_years) {
$avgyear+=$simpleyear;
$yearcount++;
}
# and add the second part to the list of years
# in the following loop
foreach $complexyear (@complex_years) {
# find the element to split on: either - or /
$split=substr($complexyear,4,1);
# ignoring @temp[0], as it is already a simple_year
@temp = split /$split/, $complexyear;
@temp[1]=substr(@temp[0],0,2).@temp[1];
push (@allyears, @temp[1]);
$avgyear+=@temp[1];
$yearcount++;
}
}
# round to nearest year
if ($yearcount) {
$avgyear=int(($avgyear/$yearcount)+.5);
} else {
$avgyear = "";
}
# now sort the years numerically
@sort = sort { $a <=> $b } @allyears;
$firstyear=@sort[0];
$lastyear= @sort[-1];
# put the printer back together
$printer=join ', ', @printer;
# add missing front or back brackets for aesthetics only
$_ = $printer;
if (/^[^\[]+\]/) {$printer='['.$printer}
if (/\[[^\]]+$/) {$printer=$printer.']'}
# now get rid of all brackets and store as $xprinter
$_ = $printer;
s/[\[\]]//g;
# if the printer is enclosed in brackets, or begins with a
# bracket, flag as unsigned
$xprinter=$_;
if ($imprint=~/\[[^\]]*\Q$xprinter\E[^\]]*/ or
$printer=~/^\[/) {
substr $flags, 1, 1, "-";
}
# output the information and continue on to the next
# record
print "$istc_number\t$imprint\t$city\t$xprinter\t";
print "$avgyear\t$firstyear\t$lastyear\t$flags\n";
}
That is, the input file begins like this:
ia00000500 [Spain or Portugal: Printer of Alfasi's Halakhot, before 1492?]
ia00001000 Westminster: Wynkyn de Worde, [about 1496]
ia00001500 Westminster: Wynkyn de Worde, [about 1497]
ia00002000 Westminster: Wynkyn de Worde, [about 1500]
ia00003000 [London: John Lettou and William de Machlinia, about 1482]
ia00004000 [London]: Richard Pynson, 9 Oct. 1499
ia00004500 [London]: Richard Pynson, 9 Oct. 1499
ia00005000 [London]: Richard Pynson, '9 Oct. 1499' [about 1503]
ia00005500 [The Netherlands: Prototypography, about 1465-80]
ia00008000 Venice: Franciscus Lapicida, 20 Oct. 1494
The output of the further manipulation here appears as follows in eight different fields:
istc_number imprint city printer avg_year first_year last_year flags
ia00000500 [Spain or Portugal: Printer of Alfasi's Halakhot, before 1492?] Spain or Portugal Printer of Alfasi's Halakhot 1492 1492 1492 ---
ia00001000 Westminster: Wynkyn de Worde, [about 1496] Westminster Wynkyn de Worde 1496 1496 1496 ++-
ia00001500 Westminster: Wynkyn de Worde, [about 1497] Westminster Wynkyn de Worde 1497 1497 1497 ++-
ia00002000 Westminster: Wynkyn de Worde, [about 1500] Westminster Wynkyn de Worde 1500 1500 1500 ++-
ia00003000 [London: John Lettou and William de Machlinia, about 1482] London John Lettou and William de Machlinia 1482 1482 1482 ---
ia00004000 [London]: Richard Pynson, 9 Oct. 1499 London Richard Pynson 1499 1499 1499 -++
ia00004500 [London]: Richard Pynson, 9 Oct. 1499 London Richard Pynson 1499 1499 1499 -++
ia00005000 [London]: Richard Pynson, '9 Oct. 1499' [about 1503] London Richard Pynson 1503 1503 1503 -+-
ia00005500 [The Netherlands: Prototypography, about 1465-80] The Netherlands Prototypography 1473 1465 1480 ---
ia00008000 Venice: Franciscus Lapicida, 20 Oct. 1494 Venice Franciscus Lapicida 1494 1494 1494 +++
§ 20 Turning the IISTC's
Locations field into a numerical count of
surviving copies presents new challenges, as the format for
recording copies varies considerably between regions.
American, German, and Italian libraries are always divided
by semicolons; Belgian and Other libraries usually appear as
City, First Library, Second Library; Dutch,
Spanish, and most Other European libraries appear as
City First Library, Second Library; and
French and British records mix both formats. In addition,
one hopes but can never be sure that the frequent records
describing a library's holdings of a given
incunable as (3, 1 defective)
consistently
mean three copies, of which one is defective
rather than 3 complete copies plus one defective
one
. Perfect accuracy in automatically counting the
IISTC's countless incunables may not be possible,
but a high degree of accuracy (verified by comparing
computer-generated results with old-fashioned tabulation) is
achievable and sufficient for answering many questions and
for helping to formulate others.
§ 21 For counting the number of extant copies in the IISTC, the process is broken down into two steps for sake of simplicity. First, a simple script-or rather, ten minor variations on a simple script-are used to extract only the relevant data from the full export of IISTC records. The following script searches out only copies in American libraries:
$batch="istc.txt"; #name of file to search
open BATCH, $batch or die "Cannot open $batch for read:$!";
while (<BATCH>) {
if (/^[ ]*USA:\t(.*?)$/) {
$match = $1;
$hit=1;
}
if (/^ISTC.*(i.\d{8})/ and ($hit == 1)) {
$hit = 0;
$istc_number = $1;
print "$istc_number\t$match\n";
}
}
The output of this script is a long list of ISTC numbers and the libraries in which copies of the relevant incunable can be found:
ia00000500 JTSL (1 leaf)
ia00001000 PML
ia00002000 FolgSL; PandJG
ia00003000 AmBML; Harv(L)L; LC(L); NewL (-); PML
ia00004000 Harv(L)L; LC; PML; UPaL; YU(B)L; EHLS (sold 1981)
ia00005000 Harv(L)L; HEHL; LC(L)
ia00008000 CPhL; Harv(M)L; PML
With minor variations in the fourth line of the script, similar files can be created for the other locations by which the IISTC organizes its copy attestations: Belgium, British Isles, Other European, France, Germany, Italy/Vatican, Netherlands, Spain/Portugal, and Other. For Spain/Portugal, for example, the output begins:
ia00008000 Avila BP
ia00009200 Barcelona BCatal, BU; Córdoba BP; Madrid BN, BU; Sevilla Colombina, BU; Toledo BP; Vigo Massó; Lisboa BN
ia00012000 Avila BP
ia00014400 El Escorial RMon
ia00016500 Córdoba BCap
ia00017000 Sevilla Colombina; Coimbra BU
The list of libraries in each location that own a given incunable is useful information that can be imported as ten new fields into the database as described in appendix 1. What would also be useful, however, is if we had a count of copies in a particular location that can be easily summed to provide a worldwide incunable count (as far as the IISTC is concerned, at least). The following script provides just such a functionality for American, Italian, and German libraries. This script is invoked a bit differently than the preceding scripts, in that it expects two command-line arguments: the name of the file to be processed and the name of the file to be written. If this script were given the name count1.pl, it might be invoked as follows to read from the file usa-libraries.txt and create the file usa-count.txt:
perl count1.pl usa-libraries.txt usa-count.txt
The script is as follows:
# script to process library-output
# files for consistently
# semicolon-delimited countries: USA,
# Italy, Germany
$in=shift; # take input file from command line
$out = shift; # take output filename from command line
open IN, $in or die "Cannot open $in for read:$!";
open OUT, ">$out" or die "Cannot open $out for write:$!";
print OUT "istc_number\tlocations\tcount\n";
while (<IN>) {
$copycount=0;
/^(i.\d{8})\t(.*)$/;
$istc_number=$1;
$locations=$2;
@libraries=split /;/, $locations;
foreach $library (@libraries) {
while ($library=~/\((?:\D|\d+[^,])[^\(]*?\)/) {
$library=~s/\((?:\D|\d+[^,])[^\(]*?\)//g;
}
#get rid of nested parentheses
$library=~s/\((\d{1,2})[^\(]*\)/\(\1\)/g;
#replace (3, 1 torn) with (3)
if ($library=~/\((\d{1,2})\)/) {$copycount+=$1} else
{$copycount++}
}
print OUT "$istc_number\t$locations\t$copycount\n";
}
The output of this script includes column headings. For German libraries, for example, it begins:
istc_number locations count
ia00008000 Bamberg SB; München BSB 2
ia00009100 Gotha ForschLB; Tübingen UB 2
ia00009200 Augsburg SStB; Bamberg SB; Berlin SB; Darmstadt LHSB; Freiburg i.Br. UB; Giessen UB; Göttingen SUB; Heidelberg UB; Karlsruhe BLB; Mainz StB; München BSB (3); München UB; Passau SB; Würzburg UB (2) 17
ia00009300 Frankfurt(Main) StUB (imperfect) 1
ia00009900 Hannover KestnerM 1
The IISTC's variability in recording copies requires the script to be adapted for other locations, however. The next two scripts are minor variations on the preceding one. The first addresses locations such as Belgium that insert a comma between the name of the city and the libraries owning a particular incunable:
# script to process library-output files for
# countries delimited as City, Library1, Library2:
# Belgium, Other [usually]
$in=shift; # take input file from command line
$out = shift; # take output filename from command line
open IN, $in or die "Cannot open $in for read:$!";
open OUT, ">$out" or die "Cannot open $out for write:$!";
print OUT "istc_number\tlocations\tcount\n";
while (<IN>) {
undef @cities;
$copycount=0;
/^(i.\d{8})\t(.*)$/;
$istc_number=$1;
$locations=$2;
@cities=split /;/, $locations;
foreach $city (@cities) {
while ($city=~/\((?:\D|\d+[^,])[^\(]*?\)/) {
$city=~s/\((?:\D|\d+[^,])[^\(]*?\)//g;
}
#get rid of nested parentheses
$city=~s/\((\d{1,2})[^\(]*\)/\(\1\)/g;
#replace (3, 1 torn) with (3)
undef @libraries;
if ($city =~ /,/) {
@libraries=split /,/, $city;
$null = shift @libraries;
} else {$libraries[0] = $city}
foreach $library (@libraries) {
if ($library=~/\((\d{1,2})\)/) {
$copycount+=$1;
} else {$copycount++}
}
}
print OUT "$istc_number\t$locations\t$copycount\n";
}
The next script is for locations such as Spain/Portugal that separate libraries within a single city from each other with commas, but without a comma after the name of the city:
# Script to process library-output files for countries
# delimited as City Library1, Library2: Other Europe,
# Spain, Netherlands, France (mostly), Britain (usually)
$in=shift; # take input file from command line
$out = shift; # take output filename from command line
open IN, $in or die "Cannot open $in for read:$!";
open OUT, ">$out" or die "Cannot open $out for write:$!";
print OUT "istc_number\tlocations\tcount\n";
while (<IN>) {
undef @cities;
$copycount=0;
/^(i.\d{8})\t(.*)$/;
$istc_number=$1;
$locations=$2;
@cities=split /;/, $locations;
foreach $city (@cities) {
while ($city=~/\((?:\D|\d+[^,])[^\(]*?\)/) {
$city=~s/\((?:\D|\d+[^,])[^\(]*?\)//g;
}
#get rid of nested parentheses
$city=~s/\((\d{1,2})[^\(]*\)/\(\1\)/g;
#replace (3, 1 torn) with (3)
undef @libraries;
@libraries=split /,/, $city;
foreach $library (@libraries) {
if ($library=~/\((\d{1,2})\)/) {
$copycount+=$1;
} else {$copycount++}
}
}
print OUT "$istc_number\t$locations\t$copycount\n";
}
As explained above, the IISTC contains some records that are truly ambiguous as to the number of copies in question, and for some locations the formatting is inconsistent. In the case of inconsistent formatting, some further refinement can help reduce the inaccuracies. It is undoubtedly useful for the staff of the British Library for their copies to appear at the head of the list of libraries in the British Isles rather than with other London libraries, and with signatures of all their copies; however, for attempting a count based on this data, it is distinctly annoying. Consider the following data:
ia00017000 London BL, 167.f.13 = IB.27036; Chatsworth; Edinburgh NLS (Inc.207); Oxford Bodley (2), Magdalen, Pembroke (2) Colleges; Stonyhurst College
ia00018600 Cambridge, Trinity Hall; Oxford Bodley, All Souls College
ia00020500 London BL, IC.28708; Barnard Castle, Bowes Museum
ia00021000 Cambridge, Trinity Hall; Oxford, New College
How is a computer to know that Oxford Bodley, All
Souls College
refers to two copies, while
Oxford, New College
refers to just one? The
assumption that a comma divides a city and its libraries
must be modified with an explicit statement that Oxford is a
city, as the following script attempts to implement. As a
consequence, the anomalous recording of British Library
copies results in an overcount by one that must be
individually corrected.
# script to process library-output files
# for British Isles.
# Remove nested parentheses first, to get
# rid of semicolons within comments, and
#then split on semicolons; if there is
# a 'London, BL', remove one from the total count
$in="libs-brit.txt"; # input file
$out = "bricount.txt"; # output file
open IN, $in or die "Cannot open $in for read:$!";
open OUT, ">$out" or die "Cannot open $out for write:$!";
print OUT "istc_number\tlocations\tcount\n"; # add column heads
while (<IN>) {
undef @cities;
$copycount=0;
/^(i.\d{8})\t(.*)$/;
$istc_number=$1;
$locations=$2;
# get rid of (digit) in BL signatures
$fixlocations=$locations;
while ($fixlocations=~/[^ ]\(\d\)/) {
$fixlocations=~s/[^ ]\(\d\)//g;
}
# get rid of nested parentheses
while ($fixlocations=~/\((?:\D|\d+[^,])[^\(]*?\)/) {
$fixlocations=~s/\((?:\D|\d+[^,])[^\(]*?\)//g;
}
@cities=split /;/, $fixlocations;
foreach $city (@cities) {
$city=~s/\((\d{1,2})[^\(]*\)/\(\1\)/g;
# replace (3, 1 torn) with (3)
$city=~s/\(\d{1,2} lea[^\(]*\)//g;
# eliminate e.g. (3 leaves)
if ($city=~/London BL,[^,]* and /) {$copycount++}
# correct for multiple BL signatures without
# comma dividers
$city=~s/(London|Oxford|Cambridge|Manchester|Dublin|Durham|Hereford|Edinburgh|Cashel|Guernsey|Coleraine|Barnard Castle|Parkminster|Northampton|Reigate| Birmingham|Canterbury|Harpenden|Brasenose|Killiney),/\1/;
# eliminate commas after city names
undef @libraries;
@libraries=split /,/, $city;
foreach $library (@libraries) {
if ($library=~/\((\d{1,2})\)/) {
$copycount+=$1;
} else {
$copycount++;
}
if ($library=~/London BL/) {
$copycount--;
}
}
}
print OUT "$istc_number\t$locations\t$copycount\n";
#print "$istc_number\t$locations\t$copycount\n";
}
Some sample output illustrates that Perl can deal with a
great deal of discrepancy in formatting and still arrive at
a correct count, while the question of what ownership of a
copy
of an incunable really means is
a separate issue entirely:
ia00425700 Oxford Bodley 1
ia00426000 London BL, IB.21897 (Acquisition 1985, not in BMC. Bound with Nicolaus Perottus, Rudimenta grammatices, Lyons, anonymous press (IB.21897) and Aelius Anthonius Nebrissensis, Introductiones Latinae, Logroño, Arnao de Brocar, 1510. In a Spanish binding); Oxford Bodley 2
ia00426300 Cambridge, St John's College (2 ff.) 1
ia00426500 London BL, IB.21851 1
ia00426600 London BL, Harl.5918(2) = IA.49742 (Colophon only, in the Bagford Collection) 1
ia00426700 Cambridge UL (imperfect, wants a2-7 and all after K6); Oxford Bodley (fragment consisting of ff. f1,6, quire E and ff. I2-5) 2
ia00428000 London BL, IA.20854 (Imperfect, wanting leaf g7 and sheets h4, i4) 1
The author wishes to thank Alvan Bregmann, Bryce Inouye, and Paul Needham for their kind assistance and helpful suggestions for this article.
The British Library. 1998. The illustrated ISTC on CD-ROM. 2nd ed. London: Primary Source Media, in association with the British Library.
Copinger, Walter Arthur. 1895-1902. Supplement to Hain's Repertorium bibliographicum: or, collections toward a new edition of that work. London: H. Sotheran.
Davies, Martin and John Goldfinch. 1992. Vergil: a census of printed editions 1469-1500. Occasional Papers of the Bibliographical Society 7. London: The Bibliographical Society.
Gesamtkatalog der Wiegendrucke. 1925-. 10 vols. to date. Stuttgart: Hiersemann.
Hain, Ludwig. 1826-1838. Repertorium bibliographicum, in quo libri omnes ab arte typographica inventa usque ad annum MD. typis expressi, ordine alphabetico vel simpliciter enumerantur vel adcuratius recensentur. 2 vols. Stuttgart: J. G. Cotta.
Neddermeyer, Uwe. 1998. Von der Handschrift zum gedruckten Buch: Schriftlichkeit und Leseinteresse im Mittelalter und in der frühen Neuzeit. Quantitative und qualitative Aspekte. Buchwissenschaftliche Beiträge aus dem deutschen Bucharchiv München 61. 2 vols. Wiesbaden: Harrassowitz.
Needham, Paul. 1999. Counting incunables: the IISTC CD-ROM. Huntington Library Quarterly 61: 457-529.
Ohly, Kurt and Vera Sack. 1966-1967. Inkunabelkatalog der Stadt- und Universitätsbibliothek und anderer öffentlicher Sammlungen in Frankfurt am Main. Frankfurt: Klostermann.
Reichling, Dietrich. 1905-1911. Appendices ad Hainii-Copingeri Repertorium bibliographicum. 7 vols. Munich: Rosenthal.
Reske, Christoph. 2000. Die Produktion der Schedelschen Weltchronik in Nürnberg. Wiesbaden: Harrassowitz.