<?xml version="1.0" encoding="UTF-8"?>
<TEI xmlns="http://www.tei-c.org/ns/1.0">

 <teiHeader>
  <fileDesc>
   <titleStmt>
    <title level="a">Palaeography and Image-Processing: Some Solutions and Problems</title>
    <author>
     <name>Peter A. Stokes</name>
     <address>
              <addrLine>University of Cambridge</addrLine>
            <addrLine>
              <ref target="mailto:pas53@cam.ac.uk">pas53@cam.ac.uk</ref>
            </addrLine>
          </address>
    </author>
    <editor role="acceptingeditor">
     <name>Daniel Paul O'Donnell</name>
     <address>
            <addrLine>University of Lethbridge</addrLine>
          </address>
    </editor>
    <editor role="recommendingreader">
     <name>Melissa Terras</name>
     <address>
            <addrLine>University College London</addrLine>
          </address>
    </editor>
    <respStmt>
     <resp>TEI-encoding by</resp>
     <name>Roberto Rosselli Del Turco</name>
    </respStmt>
   </titleStmt>
   <publicationStmt>
    <publisher>Digital Medievalist, University of Lethbridge</publisher>
    <pubPlace>Lethbridge AB, Canada T1K 3M4 </pubPlace>
    <availability>
     <p>© Peter A. Stokes, 2007. Creative Commons Attribution-NonCommercial licence</p>
    </availability>
    <date n="received" when="2007-01-11">January 11, 2007</date>
    <date n="revised" when="2007-11-06">November 6, 2007 </date>
    <date n="published" when="2007-12-24">December 24, 2007</date>
   </publicationStmt>
   <seriesStmt>
    <title>Digital Medievalist</title>
    <idno type="issue">3</idno>
    <idno type="date">2007-2008</idno>
   </seriesStmt>
   <sourceDesc>
    <p>Original Composition</p>
   </sourceDesc>
  </fileDesc>
  <encodingDesc>
   <projectDesc>
    <p>Article from Digital Medievalist Journal (URL: <ref
      target="http://www.digitalmedievalist.org/"/>)</p>
   </projectDesc>
   <refsDecl>
    <p>Citations from the text of this article should be by paragraph number (found on the ID
     attribute of the p element).</p>
   </refsDecl>
  </encodingDesc>
  <profileDesc>
   <creation/>
   <langUsage>
    <language ident="en-GB">en-GB</language>
    <language ident="lat">Latin</language>
   </langUsage>
   <textClass>
    <keywords scheme="DM">
     <term type="DMType">Article</term>
     <term type="keyword">Palaeography</term>
     <term type="keyword">Imaging</term>
     <term type="keyword">Feature-extraction</term>
     <term type="keyword">Clustering</term>
     <term type="keyword">Forensic Document Analysis</term>
    </keywords>
   </textClass>
  </profileDesc>
  <revisionDesc>
   <change><date when="2008-01-01">Tuesday Jan 1, 2008</date>. <name>Daniel Paul O'Donnell</name>.
    Corrected volume number.</change>
   <change><date when="2008-01-14">Monday Jan 14, 2008</date>. <name>Roberto Rosselli Del
   Turco</name>. Re-encoded with correct content.</change>
   <change><date when="2009-09-16">Tuesday Sept 16, 2008</date>. <name>Arianna Ciula</name>. Made
   corrections reported by author.</change>
   <change><date when="2009-09-18">Thursday Sept 18, 2008</date>. <name>Arianna Ciula</name>. 
    Corrected reference to #sriharichaaroralee2002.</change>
  </revisionDesc>
 </teiHeader>

 <text>

  <front>
   <argument n="abstract">
    <p>This paper considers the application of image-processing and data-mining to the analysis of
     scribal hands. The work of forensic document analysts on feature-extraction is considered,
     particularly the algorithms developed for automatic handwriting-recognition by Srihari, and by
     Bulacu and Schomaker. Automatic clustering is also considered using the AutoClass package.
     Preliminary results of the author’s own experiments with these approaches are presented, and
     some of the obstacles are outlined which must be overcome before a practical system can be
     developed for the automatic identification of medieval scribes. </p>
   </argument>
  </front>

  <body>
   <div xml:id="body">
    <div>
     <p xml:id="stokes.d1e201">“With the aid of technological advances palaeography, which is an art
      of seeing and comprehending, is in the process of becoming an art of measurement” (<ref
       xmlns="http://www.tei-c.org/ns/1.0" target="#bischoff1990">Bischoff 1990, 3</ref>). With this
      seemingly innocuous statement, and with the help of the editors of <title
       xmlns="http://www.tei-c.org/ns/1.0">Scrittura e Civiltà</title>, Bernhard Bischoff sparked a
      furious debate over the role in modern palaeography of objective measurement, and by
      implication of computing (<ref xmlns="http://www.tei-c.org/ns/1.0"
       target="#costamagnagasparrigilissen19951996">Costamagna et al. 1995-1996</ref>; <ref
       xmlns="http://www.tei-c.org/ns/1.0" target="#pratesi1998">Pratesi 1998</ref>; <ref
       xmlns="http://www.tei-c.org/ns/1.0" target="#gumbert1998">Gumbert 1998</ref>; <ref
       xmlns="http://www.tei-c.org/ns/1.0" target="#derolez2003">Derolez 2003, 6-9</ref>). Arianna
      Ciula has already discussed this debate in the inaugural volume of <title
       xmlns="http://www.tei-c.org/ns/1.0">Digital Medievalist</title> and I shall not repeat her
      work here. Instead, I wish to raise questions about the so-called “art of measurement” itself,
      and to see how work in related fields can be applied to palaeography. Ciula has already shown
      us one way in which computers can be used for objective analysis, and a different approach has
      recently been used to help scholars read the Vindolanda tablets (<ref
       xmlns="http://www.tei-c.org/ns/1.0" target="#terras2006">Terras 2006</ref>). However, one of
      the main difficulties faced by palaeographers is the classification and identification of
      hands, and this is an area which has already received a good deal of attention in other
      disciplines. Specifically, the community of forensic document analysts have been working for
      several years now to develop computer-based systems for identifying and classifying modern
      handwriting, and this begs the question whether such work can be applied to medieval writing
      as well. The answers to this are complex and cannot possibly be covered in a single paper, but
      instead I wish to consider two techniques which have been developed by forensic document
      analysts and which can be tested relatively easily on medieval script. By doing so I hope to
      show that this related research is indeed useful to medievalists, and in showing this I seek
      also to demonstrate that the “art of measurement” can be used not to replace other techniques
      but to supplement them and to contribute to our understanding in new and previously
      unattainable ways. </p>
    </div>
    <div>
     <head>Analysing the Script: Automatic Feature-Extraction</head>
     <p xml:id="stokes.d1e235"> The first approach to be considered here is automatic
      feature-extraction. In Zurada’s terms, the objective of feature-extraction is to produce
      feature-vectors which “retain the minimum number of data dimensions while maintaining the
      probability of correct classification”, and where “the feature space dimensionality is
      postulated to be much smaller than the dimensionality of the pattern space” (<ref
       xmlns="http://www.tei-c.org/ns/1.0" target="#zurada1992">Zurada 1992, 95</ref>). Despite this
      technical-sounding definition, feature-extraction has long been a part of “traditional”
      manuscript studies and also forensic document analysis; in this context it is simply the
      identification of key features which are used to establish the degree of similarity between
      two hands.<note xmlns="http://www.tei-c.org/ns/1.0">
       <p>I follow Malcolm Parkes here in distinguishing between “script”, “the model which the
        scribe has in mind’s eye when he writes”, and “(scribal) hand,” “what he actually puts down
        on the page.” See <ref target="#parkes1969">Parkes 1969, xxvi</ref>.</p>
      </note> At least among palaeographers, the emphasis on features has been used to introduce
      objectivity and communicability into the field: rather than describing the aspect of a page in
      subjective terms, it is now usually thought more useful to use clear and unambiguous criteria
      which can be easily understood and verified by others (<ref
       xmlns="http://www.tei-c.org/ns/1.0" target="#derolez2003">Derolez 2003, 1-2 and 6-9</ref>).
      Attempts have been made, therefore, to establish terminologies for describing letter-forms and
      script-systems which reflect the decisions, conscious or otherwise, made by medieval scribes
       (<ref xmlns="http://www.tei-c.org/ns/1.0" target="#bischoff1954">Bischoff 1954</ref>; <ref
       xmlns="http://www.tei-c.org/ns/1.0" target="#brown1990">Brown 1990</ref>; <ref
       xmlns="http://www.tei-c.org/ns/1.0" target="#derolez2003">Derolez 2003, 13-24</ref>).
      However, such features by their nature are the very ones which scribes can easily adopt and
      abandon at will. As Spumar has noted, “the copyist is not a machine programmed to determined
      functions and causing us to consider all confused variants and developments as the result of
      another hand” (<ref xmlns="http://www.tei-c.org/ns/1.0" target="#spumar1976">Spumar 1976,
      64</ref>); on the contrary, it has long been recognised that scribes will deliberately alter
      their writing to conform to the different expectations which accompany different kinds of
       text.<note xmlns="http://www.tei-c.org/ns/1.0">
       <p>Examples include scribes deliberately distinguishing between English Vernacular minuscule
        and Anglo-Caroline minuscule, for which see especially <ref target="#dumville1988">Dumville
         1988, 53-54</ref>; <ref target="#dumville1993">Dumville 1993</ref>, particularly 152-54;
        and <ref target="#dumville2001">Dumville 2001, 9</ref>.</p>
      </note> This is by no means to say that such a morphological approach to palaeography is
      invalid: on the contrary, it has proven to be extremely useful.<note
       xmlns="http://www.tei-c.org/ns/1.0">
       <p>For examples applied to late Anglo-Saxon script, see <ref target="#ker1957">Ker
        1957</ref>, especially xxv-xxxiii; <ref target="#dumville1987">Dumville 1987</ref>; <ref
         target="#dumville1994">Dumville 1994</ref>; and <ref target="#dumville1993">Dumville
        1993</ref>; note also <ref target="#derolez2003">Derolez 2003, 6-9</ref>.</p>
      </note> It does mean, however, that one must take some care in interpreting the evidence
      provided by letter-forms. The problem also remains of how one can determine which features are
      significant. Such a problem is somewhat less intractable when considering script-systems: in
      many cases, a given script can be defined with greater or lesser accuracy by a relatively
      small set of letter-forms.<note xmlns="http://www.tei-c.org/ns/1.0">
       <p>For some examples of such definitions, see note 2, above.</p>
      </note> The problem becomes much greater, however, when trying to distinguish between
      different scribes. Such identifications seem to be relatively sound if a cluster of unusual
      letter-forms can be found which occur in a group of related manuscripts and nowhere else: the
      assumption then is that those manuscripts were written by the same scribe, or at least by
      scribes from the same school. However, difficult questions must still be asked. How many
      features are required to secure an identification? How unusual must these features be? How can
      one be certain that a second scribe was not copying these features? Or that this unusualness
      is not an artefact of missing evidence rather than the oddities of a single scribe? Even the
      most highly-regarded palaeographers have slipped up while trying to navigate a path through
      these treacherous grounds.<note xmlns="http://www.tei-c.org/ns/1.0">
       <p>Neil Ker, for example, referred to a “characteristic” mark of punctuation used by the
        “Hemming” scribe but which is actually found in the work of several other scribes; see <ref
         target="#stokesforthcoming">Stokes forthcoming</ref>.</p>
      </note>
     </p>
     <p xml:id="stokes.d1e307">One possible path towards solving these problems is to use a computer
      to extract large quantities of precisely defined information which can be analysed
      statistically and which could not be obtained any other way within a practical time-frame.
      This approach has been used by researchers who have been working to develop systems for the
      automatic identification of modern handwriting.<note xmlns="http://www.tei-c.org/ns/1.0">
       <p>For a sample of such studies, see <ref target="#arazi1977">Arazi 1977</ref>; <ref
         target="#schomakerbulacuvanerp2003">Schomaker, Bulacu, and van Erp 2003</ref>; <ref
         target="#bulacuschomaker2003">Bulacu and Schomaker 2003</ref>; <ref
         target="#bulacuschomakervuurpijl2003">Bulacu, Schomaker, and Vuurpijl 2003</ref>; <ref
          target="#srihari2001">Srihari 2001</ref>; <ref target="#sriharichaaroralee2002">Srihari et al.
         2002</ref>; <ref target="#srihari2003">Srihari 2003</ref>; and <ref
         target="#zhangsargur2003">Zhang and Sargur 2003</ref>.</p>
      </note> They have experimented with several different statistical measurements which can be
      obtained from a sample of handwriting and which can then be used for comparison and
      identification. One group tested their system on one thousand samples and obtained accuracy of
      about 95% for fully-automated identification and verification of modern handwriting (<ref
       xmlns="http://www.tei-c.org/ns/1.0" target="#srihari2003">Srihari 2003, iii</ref>). A second
      group, using a much less complex system, achieved comparable results in generating a list of
      ten possible matches to a given sample out of a set of 250 writers (<ref
       xmlns="http://www.tei-c.org/ns/1.0" target="#bulacuschomakervuurpijl2003">Bulacu, Schomaker,
       and Vuurpijl 2003, 4</ref>). In both cases, however, the results were obtained using samples
      of handwriting which were obtained under carefully controlled conditions: the text was the
      same for each sample and was selected to include all important features of the hand, the pages
      were the same and were laid out in the same way, the same pens, paper, and supports were used,
      and the samples were all digitised in the same way and under the same conditions. While this
      uniformity was necessary for the scientific validity of the experiments, these conditions are
      clearly ideal and thus represent the best results one could hope to achieve. Nevertheless,
      these results do seem promising, and so the application of these methods to medieval
      handwriting deserves investigation. </p>
     <p xml:id="stokes.d1e347">In all of the cases considered, the approach has been to use a
      computer to extract features in the form of statistical measurements from digitised samples of
      handwriting, and to use these measurements to compare different hands. The bulk of the
      research, then, has been in determining which measurements to take, and a number of different
      solutions has been proposed. Some, such as the speed and pressure of the pen, need to be taken
      at the time of writing and so are of no use either to the palaeographer or to the forensic
      document analyst. Others, such as the entropy and the distribution of shades of grey, depend
      on high-quality images which have been digitised under nearly-identical conditions. Although
      libraries are producing high-quality digital images in some quantity, they are still some way
      off producing a complete corpus in any sense.<note xmlns="http://www.tei-c.org/ns/1.0">
       <p>The number of large-scale projects to digitise entire manuscripts or even libraries is
        increasing rapidly. For some examples see Codices Electronici Ecclesiae Coloniensis (<ptr
         target="http://www.ceec.uni-koeln.de/"/>), Codices Electronici Sangallenses (<ptr
         target="http://www.cesg.unifr.ch/en/"/>), Irish Script on Screen (<ptr
         target="http://www.isos.dias.ie/"/>), Early Manuscripts at Oxford University (<ptr
         target="http://image.ox.ac.uk/"/>), the Árni Magnússon Institute of Iceland (<ptr
         target="http://www.am.hi.is/WebView/?fl=20"/>) and Parker on the Web (<ptr
         target="http://parkerweb.stanford.edu/"/>). </p>
      </note> Instead, a practical system will need to function with images from a variety of
      sources and should ideally be able to cope with scans of photographs and perhaps even of
      half-tone plates in books. Similarly, at least for the purposes of an initial study, the
      algorithms need to be fairly straightforward and quick to implement; if the results show
      promise, then a longer and more concerted effort can be justified. </p>
     <p xml:id="stokes.d1e367">To this end, I have selected and implemented five measurements in
      order to test their usefulness to the study of medieval handwriting. The first of these is
      run-lengths (<ref xmlns="http://www.tei-c.org/ns/1.0" target="#arazi1977">Arazi 1977</ref>)
      and is demonstrated in Figure 1 below. By scanning through an image, the software can count
      the number of consecutive pixels corresponding either to background or to ink in a given
      direction: for example, Figure 1 shows a run of four background pixels in the horizontal
      direction, and a run of five foreground pixels in the vertical direction. Thus a large number
      of long horizontal runs would indicate more space between vertical strokes, and many long
      vertical runs might suggest a hand with elongated and relatively upright ascenders,
      descenders, and minims.<note xmlns="http://www.tei-c.org/ns/1.0">
       <p>A minim is the basic short vertical stroke used to form many letters: the letter
        <emph>i</emph> is formed with one minim, <emph>n</emph> with two, and <emph>m</emph> with
        three. An ascender is the component of a letter which reaches above minim-height and is
        found in letters such as <emph>l</emph>, <emph>h</emph>, and <emph>b</emph>. A descender is
        that which reaches below the line of writing as found in letters like <emph>p</emph> and
         <emph>q</emph>.</p>
      </note> Some degree of scaling and normalisation is required to account for differences in the
      size of the hand and the size and resolution of the image; this is discussed further below. </p>
     <figure>
      <graphic url="support/run-length.png"/>
      <figDesc>Run-lengths</figDesc>
     </figure>
     <p xml:id="stokes.d1e413">Another measurement is known as autocorrelation: in short, it
      measures the degree of regularity in a hand and indicates the distance between regularly
      occurring elements. It is calculated by overlaying a copy of the image onto itself and
      counting the number of pixels in common; the overlaid image is then moved horizontally by one
      pixel and the count repeated. A page filled entirely with perfectly reproduced and regularly
      spaced examples of the letter <emph xmlns="http://www.tei-c.org/ns/1.0">l</emph>, for example,
      will give an autocorrelation of almost zero for all horizontal shifts except those where the
      letters are all aligned, at which point it will be maximum. This is demonstrated in Figure 2,
      below. The first diagram shows almost no overlap and so the autocorrelation for this
      displacement is very low, the second shows some overlap and so a higher value, and the third
      shows almost complete overlap and so a near-maximum value. </p>
     <figure>
      <graphic url="support/autocorrelation.png"/>
      <!-- Separate line in original text added inside figDesc -->
      <figDesc>Autocorrelation. Note the increasing overlap as the horizontal displacement of the
       blue image changes relative to the black one.</figDesc>
     </figure>
     <p xml:id="stokes.d1e429">Bulacu and Schomaker have also proposed edge-directions as another
      possible metric. In this case, the edges of every letter are broken down into small, straight
      lines and their directions measured and counted (<ref xmlns="http://www.tei-c.org/ns/1.0"
       target="#bulacuschomakervuurpijl2003">Bulacu, Schomaker, and Vuurpijl 2003</ref>). The
      direction is measured by overlaying a theoretical box on the image centred at the lower tip of
      each line, and the angle is determined by detecting where the line crosses the edge of the
      box. Thus in Figure 3, below, the line has an “angle” of seven. Such a measurement, when
      calculated for all edge-segments in an image, gives an indication of the average direction of
      the strokes in a given hand. A very upright and angular hand will have most lines either
      vertical or horizontal (so with values clustering around 0 and 6), a sloping hand will show
      most edges in the direction of the slope, and a rotund hand will not show much of a peak in
      any direction. </p>
     <figure>
      <graphic url="support/edge-direction.png"/>
      <!-- Separate line in original text added inside figDesc -->
      <figDesc>Edge-direction (3-pixel radius). This edge-segment has an “angle” of 7.</figDesc>
     </figure>
     <p xml:id="stokes.d1e445">Finally, the same authors have also proposed hinge-directions (<ref
       xmlns="http://www.tei-c.org/ns/1.0" target="#bulacuschomakervuurpijl2003">Bulacu, Schomaker,
       and Vuurpijl 2003, 3</ref>). This is an extension of edge-directions but instead it considers
      “hinges”, namely the points where two straight lines meet. By measuring the angles of both
      lines in a “hinge” the metric seeks to characterise the bends and angles in a hand, as
      demonstrated in Figure 4 below. </p>
     <figure>
      <graphic url="support/hinge-direction.png"/>
      <!-- Separate line in original text added inside figDesc -->
      <figDesc>Hinge-directions (3-pixel radius). The “hinge” here has an “angle” of 7,0.</figDesc>
     </figure>
     <p xml:id="stokes.d1e462">The five algorithms were initially implemented in MATLAB using the
       DIP<hi xmlns="http://www.tei-c.org/ns/1.0" rend="italic">image</hi> toolbox provided by the
      Pattern Recognition group in the Department of Applied Physics at the Delft University of
      Technology (for full details see <ref xmlns="http://www.tei-c.org/ns/1.0"
       target="#vanginkelvankempen2004">van Ginkel and van Kempen 2003</ref>). However, the toolbox
      proved to be impractical, due partly to the inherent inefficiencies of MATLAB and partly to
      the quota of CPU cycles which had been imposed on the only system which was accessible at the
      time. Fortunately, the team at Delft have also made available the C library underlying the
       DIP<hi xmlns="http://www.tei-c.org/ns/1.0" rend="italic">image</hi> toolkit. The MATLAB code
      was therefore converted to C++, using the DIP<hi xmlns="http://www.tei-c.org/ns/1.0"
       rend="italic">lib</hi> library instead of the DIP<hi xmlns="http://www.tei-c.org/ns/1.0"
       rend="italic">image</hi> toolkit, and this was found to run at greatly increased speed while
      still allowing rapid prototyping. The five algorithms outlined above were implemented as
      described in the respective literature.<note xmlns="http://www.tei-c.org/ns/1.0">
       <p>Histograms of 100 bins were used for the run-length and autocorrelation metrics, and
        edge-fragments of four pixels were used for the edge and hinge directions: each script was
        therefore represented as an 888-dimensional vector of positive real values.</p>
      </note> Two different measures of distance were tested, the so-called Euclidean and χ<hi
       xmlns="http://www.tei-c.org/ns/1.0" rend="sup">2</hi>, and χ<hi
       xmlns="http://www.tei-c.org/ns/1.0" rend="sup">2</hi> was ultimately used.<note
       xmlns="http://www.tei-c.org/ns/1.0">
       <p>Bulacu and Schomaker tested Hamming, Minkowski up to fifth order, Hausdorff, χ<hi
         rend="sup">2</hi>, and Bhattacharyya functions to measure distance. Although they did not
        provide details of these tests, they have asserted that “only best-performing distance
        functions” were used in the final results, and their tables include only χ<hi rend="sup"
        >2</hi> and Euclidean distances. The same two functions were used in their second paper, but
        were applied to different features. See <ref target="#schomakerbulacuvanerp2003">Schomaker, Bulacu, and van Erp 2003, 546</ref>,
        and compare <ref target="#bulacuschomaker2003">Bulacu and Schomaker 2003, 3</ref>.</p>
      </note> The autocorrelation histogram was normalised such that the minimum value was zero and
      the first element was one; the others were all normalised to probability density functions
      (PDFs) but multiplied by 100 for convenience. The images were all scaled so that minims in
      each image were the same height, in order to eliminate bias due to differences in the size of
      the hand and the size and resolution of the image. Each image was then converted from
      greyscale to black and white and the edges of each stroke were obtained for the edge-direction
      and hinge- direction algorithms.<note xmlns="http://www.tei-c.org/ns/1.0">
       <p>The images were processed by applying an isodata threshold, the MorphologicalRange
        function in DIP<hi rend="italic">lib</hi> with filter parameters of 3, thresholding at 80,
        and then obtaining the Euclidean skeleton with the end-pixel condition set to “natural”. For
        these functions see <ref target="#vankempenvanginkelhendriksvanvliet2003">van Kempen et al.
         2003, 321, 415-16, and 139-40</ref>.</p>
      </note>
     </p>
     <figure>
      <graphic url="support/test1-images.png"/>
      <figDesc>Sample images for Test 1</figDesc>
     </figure>
     <p xml:id="stokes.d1e531"> For the first test, six images were used and are reproduced as
      Figure 5 above; four images are of Anglo-Caroline minuscule written by one scribe (B, D, E and
      F), and two of Insular minuscule by a second scribe (A and C). The images were all 1100×500
      pixels large, 8-bit greyscale, and taken from the same 24-bit RGB image which was scanned at
      300 dpi. In terms of the script, each image was about three lines of text high and about
      fifteen to twenty letters wide; minims were about 45 pixels high. Each set of measurements was
      generated for each sample; the results are shown in Figure 6 below. As these figures represent
      distances, the matching hands should show significantly lower values than other hands in the
      same matrix; these numbers are displayed in bold in the tables. </p>
     <figure>
      <graphic url="support/table-1.png"/>
      <figDesc>Distance matrices for the five methods in Test 1. Numbers in bold should be lower
       than the other numbers in that matrix if the corresponding method was successful.</figDesc>
     </figure>
     <p xml:id="stokes.d1e542">From these tables, it can be seen that correct results were obtained
      in every case except that the horizontal-runs algorithm failed to group Hand D correctly.
      Given the imprecise nature of the problem, it is unreasonable to expect that every metric
      should produce a perfect response every time: instead, a number of different measurements
      could be taken and a voting-mechanism or something similar used to make a final decision. In
      these circumstances, the above results look very promising indeed. </p>
     <p xml:id="stokes.d1e546">One would certainly hope that the results were this good, however,
      since the sample hands were carefully chosen, and all the images are the same size, the same
      resolution, and taken from the same original image. As noted above, however, any useful system
      will need to be able to account for differences in all of these factors. The next test,
      therefore, was designed to see if the system can indeed account for the different sizes of
      images. This time, five images were used, as reproduced as Figure 7 below. </p>
     <figure>
      <graphic url="support/test2-images.png"/>
      <figDesc>Sample images for Test 2</figDesc>
     </figure>
     <p xml:id="stokes.d1e557">Samples A and B were written in English Vernacular minuscule, and C,
      D, and E in Anglo- Caroline minuscule; image E is a subset of C, and D also overlaps
      substantially with C. Each image was 8-bit greyscale, and taken from the same 24-bit RGB image
      which was scanned at 150 dpi. All of the images were 500×220 pixels large, except for Sample C
      which measured 500×440 pixels. In terms of the script, each image was about thirty letters
      wide and either five or ten lines of text high; cue-height corresponded to about 18 pixels. </p>
     <figure>
      <graphic url="support/table-2.png"/>
      <figDesc>Distance matrices for Test 2. Numbers in bold should be smaller than others: note
       that this is not so for H-runs, row B column A, and for V-runs is only so for row E column
      C.</figDesc>
     </figure>
     <p xml:id="stokes.d1e568">Once again, the results are good but not perfect. This time, both
      run-length algorithms had some difficulty in correctly identifying the matching hands, but the
      other three all produced correct results. Perhaps more significantly, however, the larger
      image (C) was misclassified no more often than any of the other samples. Indeed, C and E match
      very closely for all of the algorithms, and this suggests that the results are indeed
      approximately independent of the size of the image. Bulacu and Schomaker’s conclusions are
      also confirmed here: the directions of edges and hinges produce superior results to the older
      metrics of run-length and autocorrelation. </p>
     <p xml:id="stokes.d1e571">A third test was conducted to compare samples of different sizes in
      both horizontal and vertical directions, as well as different resolutions. Again, five images
      were used, this time all of the same hand. All the images were 8-bit greyscale and were taken
      from a 24-bit colour image at 300 dpi, but the sizes and resolutions of the images varied, as
      shown in Figures 9 and 10 below. </p>
     <figure>
      <graphic url="support/test3-images.png"/>
      <figDesc>Sample images for Test 3</figDesc>
     </figure>
     <figure>
      <graphic url="support/table-3.png"/>
      <figDesc>Parameters for Test 3</figDesc>
     </figure>
     <p xml:id="stokes.d1e591">Samples D and E are identical except for the different resolutions.
      Since all of the samples were of the same hand, one would hope that no clear identifications
      would emerge, that all of the samples would be approximately the same distance from one
      another. On the other hand, if the system were sensitive to variations in size or resolution,
      then this should become apparent from this test. If C was substantially distant from the
      others, for example, then this would indicate that the results were indeed sensitive to size.
      Alternatively, if E was classified as distinct from the other four hands, then a bias due to
      resolution would be revealed. </p>
     <figure>
      <graphic url="support/table-4.png"/>
      <figDesc>Distance Matrices for Test 3. Note that no single value is consistently lower or
       higher than any others in any of the five matrices: this finding suggests that there is no
       bias in any of the methods. </figDesc>
     </figure>
     <p xml:id="stokes.d1e602">Once again, the results are promising. No strong bias due to size or
      resolution is revealed. Only the autocorrelation correctly reported no difference at all
      between samples D and E; more significantly, no algorithms clearly misassigned E to its own
      group, although all but the autocorrelation function returned slightly greater distances for
      this hand. </p>
     <p xml:id="stokes.d1e605">From these preliminary experiments, it seems that the algorithms in
      question show some promise and are worthy of further attention. However, it should be noted
      that even an untrained person would have had little difficulty in classifying any of the
      samples which have been tested here, and a great deal more work is required before any
      computer-based system could out-perform a human. Nevertheless, advances in image-processing
      are rapid and much more sophisticated techniques are available than those used here.<note
       xmlns="http://www.tei-c.org/ns/1.0">
       <p>For an overview of some of the recent developments in this field, see the website of the
        9th International Conference on Document Analysis and Recognition (<ptr
         target="http://www.icdar2007.org"/>), and especially <ref target="#schomakerbulacuvanerp2007">Schomaker et al. 2007</ref>.</p>
      </note> What the computer can do is process very large numbers of hands very quickly and
      produce a short-list of likely matches. Even then questions remain as to how to interpret the
      data which the algorithms present. If the distances between samples are all relatively large,
      and if the algorithms all produce much the same classification, then all is well. But how
      similar need two hands be before they are grouped? Or, in more quantitative terms, what is the
      maximum allowable distance between two hands before they are classified as different? As the
      above results have shown, the distances vary from metric to metric, and from dataset to
      dataset, and so no single number can be assigned which will hold good for all situations.
      Instead, an adaptive system must be developed, which can account for these variations and
      decide for itself what values are appropriate, and how many different groups should be formed.
      Fortunately, this is another area in which a computer can be of use.</p>
    </div>

    <div>
     <head>Clustering</head>
     <p xml:id="stokes.d1e625"> The second technique which I shall consider has a somewhat different
      point of origin. As has been discussed above, one of the primary difficulties faced by
      palaeographers is the grouping of related specimens of handwriting. The degree of objectivity
      in such a grouping varies between individuals, but whatever the approach some difficulties
      remain the same. The first results from the sheer volume of data: I am aware of no studies on
      the subject but expert palaeographers I have spoken to claim to recall no more than perhaps
      thirty or forty scribal hands at most. However the extant corpus from some scriptoria can
      number in the hundreds. Furthermore, as the previous discussion has demonstrated, it is not
      necessarily clear how such scribal hands should be grouped. A palaeographer can look through a
      large number of hands and collect data on all of the letter-forms used by all of the scribes,
      but this immediately produces a problem: either a small number of features are considered, but
      it is very difficult to determine which features are sufficient to characterise a given hand,
      or every possible feature is recorded, in which case the volume of data is too great for any
      one person to process. A similar difficulty applies to the automatic feature-extraction
      discussed in the previous section: in this case, not only the volume but also the nature of
      the data is prohibitive, since the long lists of numbers which are produced have little
      meaning outside the software which produced them. Several approaches have already been
      developed by palaeographers in an effort to accommodate the volume of data,<note
       xmlns="http://www.tei-c.org/ns/1.0">
       <p>For two such approaches, see <ref target="#gumbert1976">Gumbert 1976</ref>, and <ref
         target="#davis1998">Davis 1998</ref>.</p>
      </note> but these are relatively crude and are only useful in very simple cases. However, the
      problem of classification has been the subject of extensive research in computer science, and
      a large volume of software has already been developed and made freely available to help solve
      this problem.<note xmlns="http://www.tei-c.org/ns/1.0">
       <p>Discussions of such principles have been given by <ref target="#mackay2003">MacKay 2003,
        esp. 300</ref>, for maximum-likelihood; <ref target="#hansonstutzcheeseman2004">Hanson, Stutz, and Cheeseman 1991</ref> and
         <ref target="#stutzcheeseman1996">Stutz and Cheeseman 1996</ref> for Bayesian
        classification; and <ref target="#zurada1992">Zurada 1992</ref>, for neural networks.</p>
      </note> Once again, then, a fundamental question in palaeography has already been examined in
      depth by researchers in another discipline, and so the question must be asked whether such
      research can be usefully applied here. In the remainder of this paper I shall therefore
      consider this question and test one of the many pieces of software in a practical example. </p>
     <p xml:id="stokes.d1e658">The relevant discipline here is an entire field of artificial
      intelligence variously called data-mining, clustering, or unsupervised learning, and which has
      been defined by one author as “the problem of automatic discovery of classes in data” (<ref
       xmlns="http://www.tei-c.org/ns/1.0" target="#stutzcheeseman1996">Stutz and Cheeseman 1996,
       61</ref>).<note xmlns="http://www.tei-c.org/ns/1.0">
       <p>For a similar definition see Zurada 1992, 56-58, among others.</p>
      </note>
     </p>
     <p xml:id="stokes.d1e670">This is in contrast to “supervised learning”, in which the system is
      presented with a set of training-data, the desired classification of which is already known,
      and the network can then use this to learn how such classifications are to be obtained.
      Supervised learning presents difficulties to the palaeographer since hundreds of known
      examples are normally required, and in most cases nowhere near this number of scribal hands
      has been localised and dated.<note xmlns="http://www.tei-c.org/ns/1.0">
       <p>For the use of supervised learning in a similar context see <ref target="#terras2006"
         >Terras 2006</ref>.</p>
      </note> Instead, many automated systems have been developed for unsupervised learning which
      determine their own criteria for categorisation: very few initial assumptions are made about
      the input-data, and the machine is left to make its own decisions about which features are
      significant, how groups should be formed, and even how many groups there should be. The
      applicability of this technique to palaeography needs hardly be stated, but such an approach
      introduces complications in interpreting the output of these systems. Without any external
      guidance, the classifications which the network chooses could reflect either simple biases in
      data or important and hitherto unrecognised similarities. It need not be the case that an
      unsupervised classifier will produce exactly the same results as a human expert, and this
      raises the question of whether the machine’s results should be accepted when they differ from
      a person’s, and also how much the software should be forced to conform to preconceived notions
      about interrelations in the data. It may well be true that an eleventh-century documentary
      writ has some degree of commonality with a fourth-century luxury manuscript, however deeply
      buried that connexion is, but such a link is unlikely to be of much value to the
      palaeographer. The answer seems to be something of a compromise: as researchers have found,
      “discovery of important structure is usually a process of finding classes, interpreting the
      results, transforming and/or augmenting the data, and repeating the cycle.” (<ref
       xmlns="http://www.tei-c.org/ns/1.0" target="#stutzcheeseman1996">Stutz and Cheeseman 1996,
      62</ref>).</p>
     <p xml:id="stokes.d1e685">The program chosen for initial experiments with the medieval scripts
      is known as AutoClass. This package was developed by a group at the NASA Ames Research Centre
      to implement “unsupervised classification based on the classical mixture model, supplemented
      by a Bayesian method for determining the optimal classes” (<ref
       xmlns="http://www.tei-c.org/ns/1.0" target="#stutzcheeseman1996">Stutz and Cheeseman 1996,
       61</ref>).<note xmlns="http://www.tei-c.org/ns/1.0">
       <p>See also Hanson, Stutz, and Cheeseman 1991, and the project website at <ptr
         target="http://ic.arc.nasa.gov/ic/projects/bayes-group/autoclass/"/>.</p>
      </note> The program has been carefully designed to be as general as possible, making no
      assumptions about the underlying data or even the number of groups into which the samples
      should be classified. It can accommodate both real and discrete values, and so can be used
      with the measurements discussed in the previous section of this paper but also with lists of
      features which have been gathered by a palaeographer. It was first used by this author to
      classify eleven images of five different hands using data which had been produced by the five
      algorithms discussed above. Although the number of images was not particularly large, the
      distance-matrices were still large enough, and the variation in distances small enough, that a
      grouping was not immediately apparent. To this end, the C++ software was modified to produce
      the header, database, and model files required by AutoClass, incorporating all 888 data-points
      for each of the eleven hands. In practice, 327 of these points had only one unique value and
      so were of no use in classification; therefore 561-dimensional vectors of real scalar values
      were employed.<note xmlns="http://www.tei-c.org/ns/1.0">
       <p>The software was configured with a zero-point of 0, a relative error of 0.02, and using
        the Single Normal CN model An explanation of these settings is given in the
         <code>preparation-c.text</code> and <code>models-c.text</code> files which are included in
        the AutoClass distribution, for which see <ptr
         target="http://ic.arc.nasa.gov/ic/projects/bayes-group/autoclass/"/>.</p>
      </note> The software was allowed to run for 19365 tries, after which time it classified the
      hands into three different groups with an approximate marginal likelihood of - 27417.639. The
      samples and their classification are shown in Figure 12 below, and the expected grouping is
      A-D as one, E-H as another (with subgroups E-F and G-H), and I-K as the third. </p>
     <figure>
      <graphic url="support/autoclass.png"/>
      <figDesc>Classification of scribal hands using AutoClass. The expected result was to group
       A-D, E-H (with subgroups E-F and G-H), and I-K.</figDesc>
     </figure>
     <p xml:id="stokes.d1e722">As can be seen, the classification was largely successful, except
      that it associated Samples A and B with E and F on the one hand, and C and D with G and H on
      the other. The precise reasons for this are not clear, and are possibly due to the fact that C
      and D are twice the size of A and B: although earlier tests suggested that variations in
      image-size had little impact on the measurements, they may have had enough of an impact to
      affect the more subtle categorization which is being attempted here. Given that the run-length
      measurements were more likely to return false groupings, it may be that removing the data
      contributed by these algorithms will produce better results. Although more work is certainly
      required, however, these initial results do suggest that this computer-based approach may be
      of some use. </p>
     <p xml:id="stokes.d1e725">As noted above, AutoClass can also incorporate discrete data in
      addition to the automatically generated feature vectors. This then allows a second
      application: the classification of hands based on features which have been extracted manually
      by a palaeographer. To this end, I constructed a list of some 286 features and identified
      which of those features are present in 466 sample hands, primarily vernacular writing from
      England datable to the late tenth and early eleventh centuries.<note
       xmlns="http://www.tei-c.org/ns/1.0">
       <p>For a full discussion of the hands and features see <ref target="#stokes2005">Stokes
        2005</ref>.</p>
      </note> In order to facilitate the entry of information into the computer, I created a form
      within a pre-existing database of all manuscripts and scribal hands under consideration.<note
       xmlns="http://www.tei-c.org/ns/1.0">
       <p>The database has not yet been made publicly available but the content is derived from
        <ref target="#gneuss2001">Gneuss 2001</ref>, <ref target="#ker1957">Ker 1957</ref>, <ref target="#sawyer1968"
         >Sawyer 1968</ref>, and my own research. A detailed discussion of the hands, and results
        obtained from the database, can be found in <ref target="#stokes2005">Stokes 2005</ref>.
       </p>
      </note> Since the data to be entered is a simple “Yes/No” value for each field, it might be
      thought that the most appropriate form would contain nothing but a long list of check-boxes.
      In practice, however, this proved to be extremely unwieldy, and a great deal of time was
      initially spent looking through the nearly three hundred boxes in order to find the ones which
      were required. Similarly, it was very difficult to add, remove, or otherwise alter the list of
      features, and such alterations are essential as one’s sense of which features should be
      recorded alters with experience. Instead, a second table was created which simply contained
      hand-feature pairs, and a form created which contained nothing more than a drop-down list of
      hands and a drop-down list of features; this is shown in Figure 13 below. </p>
     <figure>
      <graphic url="support/form.png"/>
      <figDesc>Database form for the entry of features</figDesc>
     </figure>
     <p xml:id="stokes.d1e763">This proved to be very efficient for data-entry, and just over 17,000
      hand-feature pairs were entered for the 466 hands. However, this specially developed format is
      not recognised by any generic classifier I am aware of, since those classifiers all expect
      input in the form of vectors. To this end, a second piece of software was developed which read
      in a file exported from the database, processed all of the hand-feature pairs for each hand,
      and then produced the database, header, and model files which AutoClass could then read.
      Facilities were also added to weight the data, to apply certain rules whereby the presence of
      a given feature could be inferred from another (for example, that horned <emph
       xmlns="http://www.tei-c.org/ns/1.0">a</emph> must also be flat-topped), and to produce
      histograms of features by date and location. Twenty-two of the 286 dimensions had only one
      unique value and so were ignored; the remainder were entered as discrete nominal values with
      the single multinomial model. Results at the time of writing have been somewhat disappointing
      as the AutoClass software has a strong tendency to group all of the manuscripts together in a
      single class. Somewhat better results have been obtained by reducing the number of features
      and considering only those which I had previously identified as being of greater significance
      but even this usually produces only two or three classes for the 466 different hands. Indeed,
      the most useful approach so far has been to abandon automatic classification entirely and to
      build forms into the database which allow an expert user to search directly for different
      features, or to obtain the features which are found in scribal hands from a given
       location.<note xmlns="http://www.tei-c.org/ns/1.0">
       <p>For the results of this analysis see <ref target="#stokes2005">Stokes 2005</ref>, and for
        a similar approach but with a very different interface see the palaeographic catalogue in
        the MANCASS C11 database (<ptr
         target="http://www.arts.manchester.ac.uk/mancass/C11database/"/>).</p>
      </note> Examples of these forms are shown in Figures 14 and 15 below. </p>
     <figure>
      <graphic url="support/search-1.png"/>
      <figDesc>Searching for scribal hands by letter-form. Note that eleven of the sixteen hands
       with the features indicated are associated either with Southeast England or with Ælfric who
       was at Cerne Abbas, Dorset, but had close links to Christ Church, Canterbury
      (CaCC).</figDesc>
     </figure>
     <figure>
      <graphic url="support/search-2.png"/>
      <figDesc>Searching for letter-forms and scribal hands by location. The form tells us that 56
       hands can be localised to Worcester or York, of which 44 show wedged ascenders, 40 show round
       c, 37 show horizontal minim-feet, and so on.</figDesc>
     </figure>
     <p xml:id="stokes.d1e796">An approach such as this is useful but it is very time-consuming both
      to build the corpus and to search it. It also depends very heavily on all parties using the
      same terminology when describing letter-forms but no such standard terminology yet exists
       (<ref xmlns="http://www.tei-c.org/ns/1.0" target="#bischoff1954">Bischoff 1954</ref>; <ref
       xmlns="http://www.tei-c.org/ns/1.0" target="#derolez2003">Derolez 2003, 13-24</ref>). It is
      also almost impossible for a person to assess the relative significance of each of the 286
      dimensions and to judge which combination of features would produce the best results. For
      these reasons a fully automated approach such as AutoClass may seem preferable. However,
      difficulties of terminology still apply as these are used to build the underlying data.
      Similarly the relative significance of features is important in automated classification
      since, as discussed above, the results improved markedly when a relatively small number of
      features was entered into the classifier, these features having been predetermined through
      “traditional” palaeographical research. However, this human intervention eliminates one of the
      primary advantages of using a computer, namely the ability to assess very many different
      elements at once. Furthermore, as observed above, it is an interesting question how much a
      computer-based approach might reveal new relationships and significant features which have not
      hitherto been considered by palaeographers. AutoClass itself reports what it considers to be
      the relative significance of features, and this information could be of use not only to reduce
      the volume of data which is entered, but also to provide clues to the palaeographer regarding
      which features should be considered. However, the software can only consider the data which it
      is given, and if the human user is to filter out much of this information beforehand then he
      or she denies this possibility to the machine. A hierarchical classifier may produce better
      results, since the data can naturally be organised as hands within scribes within scriptoria
      within scripts, but this again imposes a structure which may or may not be valid, and indeed
      in the late Anglo-Saxon period the evidence seems quite clear that such a structure did not
       exist.<note xmlns="http://www.tei-c.org/ns/1.0">
       <p>I am indebted to Prof. David MacKay for this suggestion. For an example of such a
        classifier see the dendogram presented by <ref target="#ciula2005">Ciula 2005</ref>. For the
        lack of organisation in Anglo-Saxon script see especially <ref target="#ker1985">Ker 1985,
         34</ref>.</p>
      </note> Ultimately, however, it is perhaps unreasonable to expect any software to produce
      useful results without a great deal of effort and experimentation, given such a complex
      data-set. As noted above, the discovery of structure is successful when implemented as a
      process rather than a one-off attempt.</p>
    </div>

    <div>
     <head>Concluding Remarks</head>
     <p xml:id="stokes.d1e823"> The above discussion has concentrated on only two of the many
      possible ways in which research in apparently unrelated fields can be applied to palaeography.
      As I have already suggested, these applications all require careful thought and no small
      effort to ensure that they are carried out appropriately; technology provides tools rather
      than magical solutions, and no tool is useful unless it is properly used. Similarly, I do not
      think that computer-based approaches can or should replace traditional methods of
      palaeography; instead, the technology enables new approaches which provide different types of
      evidence for subsequent (human) interpretation. With proper care, these approaches can make
      significant contributions to our understanding of medieval palaeography and are certainly here
      to stay. Indeed, I can think of no better conclusion than Gumbert’s rephrasal of Bischoff’s
      well-known line: “palaeography, <hi xmlns="http://www.tei-c.org/ns/1.0" rend="italic">and
       codicology</hi>, which are arts of seeing and feeling, are now, <hi
       xmlns="http://www.tei-c.org/ns/1.0" rend="italic">happily</hi>, in the process of becoming
       <hi xmlns="http://www.tei-c.org/ns/1.0" rend="italic">also</hi> arts of measurement.” (<ref
       xmlns="http://www.tei-c.org/ns/1.0" target="#gumbert1998">Gumbert 1998, 404</ref>).</p>
    </div>

   </div>
  </body>

  <back>
   <div>
    <listBibl>
     <bibl xml:id="arazi1977"><author>Arazi, B.</author> 1977. <title>Handwriting identification by
       means of run-length measurements</title>. Institute of Electrical and Electronic Engineering
      Transactions Systems, Man and Cybernetics SMC-7, no. 12:878-81. </bibl>
     <bibl xml:id="bischoff1954"><author>Bischoff, Bernhard</author>. 1954. <title>Nomenclature des
       écritures livresques du IXe au XIIIe siècle</title>. In Nomenclature des écritures livresques
      du IXe au XVIe siècle, edited by B. Bischoff, G. I. Lieftink and G. Battelli, 7-14. Paris:
      Centre National de la Recherche Scientifique. </bibl>
     <bibl xml:id="bischoff1990"><author>Bischoff, Bernhard</author>. 1990. <title>Latin
       palaeography: Antiquity and the middle ages</title>. Translated by D. Ó Cróinín and D. Ganz.
      Cambridge: Cambridge University Press. </bibl>
     <bibl xml:id="brown1990"><author>Brown, Michelle P</author>. 1990. <title>A guide to western
       historical scripts from antiquity to 1600</title>. London: British Library. </bibl>
     <bibl xml:id="bulacuschomaker2003"><author>Bulacu, Marius</author>, and <author>Lambert
       Schomaker</author>. 2003. <title>Writer style from oriented fragments</title>. In Proceedings
      of the Tenth International Conference on Computer Analysis of Images and Patterns (Groningen -
      The Netherlands, August), 460-469. </bibl>
     <bibl xml:id="bulacuschomakervuurpijl2003"><author>Bulacu, Marius</author>, <author>Lambert
       Schomaker</author>, and <author>Louis Vuurpijl</author>. 2003. <title>Writer-identification
       using edge-based directional features</title>. In Proceedings of the Seventh International
      Conference on Document Analysis and Recognition (Edinburgh - Scotland, August), 2:937-941. </bibl>
     <bibl xml:id="ciula2005"><author>Ciula, Arianna</author>. 2005. <title>Digital palaeography:
       Using the digital representation of medieval script to support palaeographic
      analysis</title>. Digital Medievalist 1. </bibl>
     <bibl xml:id="costamagnagasparrigilissen19951996">Costamagna, Giorgio, Françoise Gasparri, Léon
      Gilissen, et al. 1995 and 1996. <title>Commentare Bischoff</title>. Scrittura e Civiltà
      19:325-48 and 20:401-7. </bibl>
     <bibl xml:id="davis1998"><author>Davis, Lisa Fagin.</author> 1998. <title>Towards an automated
       system of script classification</title>. Manuscripta 42:193-201.</bibl>
     <bibl xml:id="derolez2003"><author>Derolez, Albert.</author> 2003. <title>The palaeography of
       gothic manuscript books from the twelfth to the early sixteenth century</title>. Cambridge:
      Cambridge University Press.</bibl>
     <bibl xml:id="dumville1987"><author>Dumville, David N.</author> 1987. <title>English square
       minuscule script: The background and earliest phases</title>. Anglo-Saxon England 16:147-179. </bibl>
     <bibl xml:id="dumville1988"><author>Dumville, David N.</author> 1988. <title>Beowulf come
       lately: Some notes on the palaeography of the Nowell Codex</title>. Archiv für das Studium
      der neueren Sprachen und Literaturen 225:49-63. </bibl>
     <bibl xml:id="dumville1993"><author>Dumville, David N.</author> 1993. <title>English caroline
       script and monastic history: Studies in benedictinism, A.D. 950-1030</title>. Woodbridge:
      Boydell. </bibl>
     <bibl xml:id="dumville1994"><author>Dumville, David N.</author> 1994. <title>English square
       minuscule script: The mid-century phases</title>. Anglo-Saxon England 23:133-164. </bibl>
     <bibl xml:id="dumville2001"><author>Dumville, David N.</author> 2001. <title>Specimina codicum
       palaeoanglicorum</title>. In Kansai university collection of essays in commemoration of the
      50th anniversary of the Institute of Oriental and Occidental Studies, 1-24. Suita, Osaka.</bibl>
     <bibl xml:id="gneuss2001"><author>Gneuss, Helmut</author>. 2001. <title>Handlist of  Anglo-Saxon Manuscripts: A
      List of  Manuscripts and Manuscript Fragments Written or Owned in
      England up to 1100</title>. Tempe, AZ: Arizona Center for Medieval and
      Renaissance Studies.</bibl>
     <bibl xml:id="gumbert1976"><author>Gumbert, J. P.</author> 1976. <title>A proposal for a cartesian
      nomenclature</title>. In Essays presented to G. I. Lieftinck, IV: Miniatures, scripts, collections,
      edited by J. P. Gumbert and M. J. M. de Haan, 45-52. Amsterdam: Van Gendt. </bibl>
     <bibl xml:id="gumbert1998"><author>Gumbert, J. P.</author> 1998. <title>Commentare “Commentare
       Bischoff”</title>. Scrittura e Civiltà 22:397-404. </bibl>
     <bibl xml:id="hansonstutzcheeseman2004">Hanson, Robin, John Stutz, and Peter Cheeseman. 2004.
       <title>Bayesian classification theory: Technical report FIA-90-12-7-01</title>. NASA 1991.
       <ptr target="http://ic.arc.nasa.gov/ic/projects/bayes-group/images/tr-fia-90-12-7-01.ps"/>. </bibl>
     <bibl xml:id="ker1957"><author>Ker, Neil R.</author> 1957. <title>Catalogue of manuscripts
       containing Anglo-Saxon</title>. Oxford: Clarendon. </bibl>
     <bibl xml:id="ker1985"><author>Ker, Neil R.</author> 1985. <title>Books, collectors and
       libraries: Studies in medieval heritage</title>. London: Hambledon. </bibl>
     <bibl xml:id="mackay2003"><author>MacKay, David J. C.</author> 2003. <title>Information theory,
       inference, and learning algorithms</title>. Cambridge: Cambridge University Press. </bibl>
     <bibl xml:id="parkes1969"><author>Parkes, Malcolm</author>. 1969. <title>English cursive book
       hands, 1250-1500</title>. Oxford: Clarendon. </bibl>
     <bibl xml:id="pratesi1998"><author>Pratesi, Alessandro</author>. 1998. <title>Commentare
       Bischoff: Un secondo intervento</title>. Scrittura e Civiltà 22:405-8. </bibl>
     <bibl xml:id="sawyer1968">
      <author>Sawyer, P. H.</author> 1968. <title>Anglo-Saxon charters: An annotated list and
       bibliography</title>. London: Royal Historical Society. Revised electronic version by R.
      Rushforth, S. Kelly, S. Miller et al. available at <ptr target="http://www.esawyer.org.uk"/>.</bibl>
     <bibl xml:id="schomakerbulacuvanerp2003">Schomaker, Lambert, Marius Bulacu, and Merijn van Erp.
      2003. <title>Sparse-parametric writer identification using heterogeneous feature
      groups</title>. In Proceedings of the International Conference on Image Processing (Barcelona
      - Spain, September), 1:545-548. <ptr
       target="http://www.ai.rug.nl/~bulacu/icip2003-schomaker-bulacu-erp.pdf"/>.</bibl>
     <bibl xml:id="schomakerbulacuvanerp2007">Schomaker, Lambert, Marius Bulacu, and Merijn van Erp.
      2007. <title>Advances in writer identification and verification</title>. Keynote paper
      delivered to the 9th International Conference on Document Analysis and Recognition. Curitiba:
      ICDAR. <ptr target="http://www.icdar2007.org/ICDAR2007_KeyNote_LSchomaker.pdf"/>.</bibl>
     <bibl xml:id="spumar1976"><author>Spumar, Pavel</author>. 1976. <title>Palaeographical
       difficulties in defining an individual script</title>. In Essays presented to G. I.
      Lieftinck, IV: Miniatures, scripts, collections, edited by J. P. Gumbert and M. J. M. de Haan,
      62-68. Amsterdam: Van Gendt. </bibl>
     <bibl xml:id="srihari2001"><author>Srihari, Sargur N.</author> 2001. <title>Handwriting
       identification: Research to study validity of individuality of handwriting and develop
       computer-assisted procedures for comparing handwriting</title>. Buffalo, NY: Center of
      Excellence for Document Analysis and Recognition. </bibl>
     <bibl xml:id="srihari2003"><author>Srihari, Sargur N.</author> 2003. <title>Quantitative
       assessment of handwriting individuality [Powerpoint Presentation]</title>. CEDAR<ptr
       target="http://www.cedar.buffalo.edu/NIJ/Pres/Overview-2003b_files/frame.htm"/>. </bibl>
     <bibl xml:id="sriharichaaroralee2002">Srihari, Sargur N., Sung-Hyuk Cha, Hina Arora, and
      Sangjik Lee. 2002. <title>Individuality of handwriting</title>. Journal of Forensic Science
      47:1-17. </bibl>
     <bibl xml:id="stokes2005"><author>Stokes, Peter A.</author> 2005. <title>English vernacular
       script ca 990–ca 1035</title>. Cambridge: unpublished Ph.D. dissertation.</bibl>
     <bibl xml:id="stokesforthcoming"><author>Stokes, Peter A.</author> Forthcoming. <title>The
       “Vision of Leofric”: manuscript, text, and content</title>. Peritia. </bibl>
     <bibl xml:id="stutzcheeseman1996"><author>Stutz, John</author>, and <author>Peter
      Cheeseman</author>. 1996. <title>Bayesian classification (Autoclass): Theory and
      results</title>. In Advances in knowledge, discovery and data mining, edited by U. Fayyad, G.
      Piatetsky-Shapiro, P. Smyth and R. Uthurusamy, 61-83. Cambridge, MA: MIT Press. </bibl>
     <bibl xml:id="terras2006"><author>Terras, Melissa</author>. 2006. <title>Image to
       interpretation: An intelligent system to aid historians in reading the Vindolanda
      Texts</title>. Oxford: Oxford University Press. </bibl>
     <bibl xml:id="vanginkelvankempen2004"><author>van Ginkel, Michael</author>, and <author>Geert
       van Kempen</author>. 2004. <title>DIP<hi rend="italic">image</hi> and DIP<hi rend="italic"
        >lib</hi></title>. <ptr target="http://www.ph.tn.tudelft.nl/DIPlib/"/>. </bibl>
     <bibl xml:id="vankempenvanginkelhendriksvanvliet2003">van Kempen, Geert, Michael van Ginkel,
      Cris L. Luengo Hendriks, and Lucas J. van Vliet. 2003. <title>DIP<hi rend="italic">lib</hi>
       function reference</title>. Delft: Delft University of Techonology. </bibl>
     <bibl xml:id="zhangsargur2003"><author>Zhang, Bin</author>, and <author>Sargur N.
      Srihari</author>. 2003. <title>Binary vector dissimilarity measures for handwriting
       identification</title>. Document Recognition and Retrieval 10:28-38. </bibl>
     <bibl xml:id="zurada1992"><author>Zurada, Jacek M.</author> 1992. <title>Introduction to
       artificial neural systems</title>. St Paul: West Publishing. </bibl>
    </listBibl>
   </div>
  </back>

 </text>

</TEI>
