﻿<?xml-stylesheet type="text/xsl" href="tei-adhoc-html.xsl"?>
<TEI xmlns="http://www.tei-c.org/ns/1.0" xml:id="ab-743">
  <teiHeader>
    <fileDesc>
      <titleStmt>
        <title>Towards Hermeneutic Markup: An architectural outline</title>
        <author><name>Wendell Piez</name> <affiliation>Mulberry Technologies,
          Inc., United States of America</affiliation>
          <email>wapiez@mulberrytech.com</email> </author>
      </titleStmt>
      <publicationStmt>
        <publisher>Centre for Computing in the Humanities, King's College
          London</publisher>
        <address>
          <addrLine>Strand, London WC2R 2LS, England, United Kingdom. Tel:+44
            (0) 20 7836 5454</addrLine>
          <addrLine>http://www.kcl.ac.uk/cch/</addrLine>
        </address>
      </publicationStmt>
      <sourceDesc>
        <p>No source: created in electronic format.</p>
      </sourceDesc>
    </fileDesc>
    <revisionDesc>
      <change><date>2010-06-20</date><name>wap</name> <desc>Minimal revisions to
        markup for personal publication</desc> </change>
      <change> <date>2010-04-29</date> <name>GF</name> <desc>CCHLite
        encoding</desc> </change>
    </revisionDesc>
  </teiHeader>
  <text type="paper">
    <body>
      <div>
        <p>By <soCalled>hermeneutic</soCalled> markup I mean markup that is
          deliberately interpretive. It is not limited to describing aspects or
          features of a text that can be formally defined and objectively
          verified. Instead, it is devoted to recording a scholar&apos;s or
          analyst&apos;s observations and conjectures in an open-ended way. As
          markup, it is capable of automated and semi-automated processing, so
          that it can be processed at scale and transformed into different
          representations. By means of a markup regimen perhaps peculiar to
          itself, a text will be exposed to further processing such as text
          analysis, visualization or rendition. Texts subjected to consistent
          interpretive methodologies, or different interpretive methodologies
          applied to the same text, can be compared. Rather than being devoted
          primarily to supporting data interchange and reuse &#8211; although
          these benefits would not be excluded &#8211; hermeneutic markup is
          focused on the presentation and explication of the interpretation it
          expresses.</p>
        <p>Hermeneutic markup in its full form does not presently exist. XML,
          and especially TEI XML, provides a foundation for this work. But due
          to limitations both in currently dominant approaches to XML, and in
          XML itself, a number of important desiderata remain before truly
          sophisticated means can be made available for scholars to exploit the
          full potentials of markup for literary study, as implied, for example,
          by ideas such as Steven Ramsay&apos;s Algorithmic Criticism or what I
          described in 2001 (following Geoffrey Rockwell and John Bradley) as
          <soCalled>exploratory markup</soCalled> (Piez 2001. See also
          especially Buzzetti, 2002 and McGann, 2004.)</p>
        <p>Prototype user interfaces designed to enable one or another kind of
          <hi rend="italic">ad hoc</hi> textual annotation or markup have been
          developed, for the most part independently of one another. (Several
          are cited.) This shows that the idea of hermeneutic markup, or
          something like it, is not new; but none of these have yet made the
          breakthrough. An important reason is that hermeneutic markup in its
          full sense will not be possible on the basis simply of a standard tag
          set or capable user interface, because it will mean not just that we
          can describe a data set using markup (we can already do that), but
          that we can actively develop, for a <hi rend="italic">particular</hi>
          text or family of texts, an appropriate, and possibly highly
          customized, means and methodology for doing so.</p>
        <p>A demonstration of a prototype markup application helps to show the
          potentials and challenges, in a very rudimentary form. [Screenshots
          appear in Figure 1.] This graphical and interactive rendering of
          markup in the source files presents an interpretation of the
          grammatical/rhetorical structure (sentences and phrases) as well as
          verse structure (lines and stanzas) in the text. Unfortunately, while
          the encoding for the sonnets here is not inordinately difficult
          &#8211; <soCalled>milestones</soCalled> are used, in a conventional
          manner, to denote the presence of structures that overlap the primary
          structure of the encoded document &#8211; the code that renders it
          incurs significant extra overhead to run, because XML technologies are
          ill-fitted to manage the kind of information we are interested in
          here, namely the overlapping of these structures that characterizes
          the sonnet form. XML doesn&apos;t do overlap. As long as a sentence or
          phrase overlaps a line &#8211; a very common occurrence and important
          poetic device &#8211; the normative XML data model, a
          <soCalled>tree</soCalled>, cannot capture both together. In order to
          do processing like what happens here, one or another workaround has to
          be resorted to. So while XML is being used here, it is a clumsy means
          to this end.</p>
        <p> <figure>
          <graphic url="743_Fig1a.jpg" rend="width:30%; margin:1%; border: thin solid black" mimeType="image/jpeg"/>
          <graphic url="743_Fig1b.jpg" rend="width:30%; margin:1%; border: thin solid black" mimeType="image/jpeg"/>
          <graphic url="743_Fig1c.jpg" rend="width:30%; margin:1%; border: thin solid black" mimeType="image/jpeg"/>
          <head>Screenshots of three sonnets with rendition of overlapping
            (verse and sentence/phrase) structures. The interface (implemented
            in W3C-standard SVG) is dynamic and responds to user input to
            highlight overlapping ranges of text.</head>
          </figure> </p>
        <p>But overlap is only part of the problem. Consider Alfred Lord
          Tennyson&apos;s <hi rend="italic">Now Sleeps the Crimson Petal, Now
          the White</hi> (the third example). This too is a sonnet, after a
          fashion, although it does not have a conventional sonnet&apos;s
          octave/sestet structure. Since this application does not work with a
          schema for sonnets that enforces this structure, this works well
          enough. Yet as texts or collections grow in scale and complexity,
          having a schema is essential to enforcing consistency and helping to
          ensure that like things are marked up alike. A framework for this
          application must not only find a way to work around the overlap; it
          must also deploy a schema (or at any rate some sort of validation
          technology) flexible enough &#8211; at least if this instance is to be
          valid to it &#8211; that such outliers from regular form are
          permissible, even while attention is drawn to them (see Birnbaum
          1997).</p>
        <p>Currently, XML developers generally (setting aside the problem of
          overlap) do not consider this to be problematic in itself; indeed,
          part of the fun and interest of markup is in devising and applying a
          schema that fits the data, however strange and interesting it may be.
          What is not so fun is to have to repeat this process endlessly, being
          caught in a cycle of amending and adjusting a schema constantly (and
          sooner or later, scripts and stylesheets) in order to account for
          newly discovered anomalies. Sooner or later, when exhaustion sets in
          or the limits of technical knowhow are reached, one ends up either
          faking it with tags meant for other purposes (thereby diluting markup
          semantics in order to <hi rend="italic">pretend</hi> to represent the
          data), or just giving up.</p>
        <p>Extending a schema is found to be a problem not only because
          validating and processing any model more complex than a single
          hierarchy is a headache even for technical experts, but also, more
          generally, because current practices assume a top-down schema
          development process. Despite XML&apos;s support for processing markup
          even without a schema, both XML tools and dominant development
          methodologies assume that schema design and development occurs prior
          to the markup and processing of actual texts. This priority is both
          temporal and logical, reflecting a conception of the schema as a
          declaration of constraints over a set of instances (a
          <soCalled>type</soCalled>), appropriate to publishing systems built to
          work with hundreds or thousands of documents, with a requirement for
          backwards compatibility (documents encoded earlier cannot be revised
          easily or at all) and limited flexibility to adapt to new and
          interesting discoveries. The centrality of the schema within this kind
          of system inhibits, when it does not altogether frustrate, the
          flexible development of a markup practice that is sensitive,
          primarily, to a text under study, and this conception of a
          schema&apos;s authority works poorly when considering a single text
          <hi rend="italic">sui generis</hi> &#8211; the starting point for
          hermeneutic markup. In hermeneutic markup, a schema should be, first
          and last, an apparatus and a support, not a Procrustean bed.</p>
        <p>All these problems together indicate the outline of a general
          solution: <list>
          <item>A data model supporting arbitrary overlap.</item>
          <item>Interfaces, including a markup syntax, that facilitate the
            creation, editing and analysis of texts using this data model, with
            the capability of defining <hi rend="italic">ad hoc</hi> elements
            and properties (attributes) on the fly.</item>
          <item>A transformation technology supporting (in addition to data
            transformations) analytical tools applicable to the markup as such
            (not just the raw text), with the capability of managing elements
            and their properties in sets, locating them, listing them by type,
            sorting, visualizing and comparing them.</item>
          <item>Schema-inferencing capabilities for describing the structural
            relations within either an entire marked-up corpus, or within
            identifiable segments, sections or profiles of it.</item>
          <item>In connection this, a schema technology that supports partial
            and modular validation.</item>
          </list> </p>
        <p>A system with all these features would support an iterative and
          <soCalled>agile</soCalled> approach to markup development. We would
          start by tagging. (In a radical version of this approach we might
          start by tagging for presentation, perhaps using just a lightweight
          HTML or TEI variant for our first cut.) Then we introduce a
          provisional schema or schemas capable of validating the tagging we
          have used. This requires assessing which cases of overlap in the text
          are essential to our document analysis, and which are incidental and
          subject to normalization within hierarchies. Having refined the
          schema, we return to the tagged text, to consider both how its tagging
          falls short (with respect to whatever requirements we have for either
          data description or processing), and how it may be enhanced, better
          structured and regularized. During this process we also begin to
          develop and deploy applications of the markup. We then revise,
          refactor and extend both tagging and schema, applying data
          transformations as needed, in order to better address the triple goals
          of adequate description, processing, and economy of design.</p>
        <p>Such a system would not only be an interesting and potentially
          ground-breaking approach to collaborative literary study; it would
          also be a platform for learning about markup technologies, an
          increasingly important topic in itself. Moreover, hermeneutic markup
          represents an opportunity to capitalize on investments already made,
          as texts encoded in well-understood formats like TEI are readily
          adaptable for this kind of work.</p>
        <p>Many of these capabilities have already been demonstrated or sketched
          in different applications or proposals for applications, including W3C
          XML Schema (partial validation); James Clark&apos;s Trang (schema
          inferencing for XML); LMNL/CREOLE (overlap, structured annotations,
          validation of overlap); JITTs (XML <soCalled>profiles</soCalled> of
          concurrent overlapping structures); and TexMECS (overlap,
          <soCalled>virtual</soCalled> and discontinuous elements).</p>
        <p>The presentation will conclude with a demonstration of various
          outputs from the data sources used in the demo, which provide building
          blocks towards the kind of system sketched here. A range-analysis
          transformation can show which types of structures in a markup instance
          overlap with other structures, and conversely which structures nest
          cleanly. Complementary to this, an <soCalled>XML induction
          processor</soCalled> is capable of deriving well-formed XML
          representations of texts marked up with overlapping structures &#8211;
          from which, in turn, XML schemas can be derived.</p>
        <figure>
          <graphic url="743_Fig2a.jpg" rend="left-img" mimeType="image/jpeg"/>
          <head>A workflow diagram showing the architecture of present
            (XML-based) markup systems. Both schema and processing logic are
            considered to be static; modifying them is an activity extraneous to
            document markup and production.</head>
        </figure>
        <figure>
          <graphic url="743_Fig2b.jpg" rend="left-img" mimeType="image/jpeg"/>
          <head>An architecture capable of supporting hermeneutic markup would
            account directly for document analysis and for the design of schema,
            queries and processing. While in fact this is often done even today,
            one has to work against the current tool set to do it, questioning
            its assumptions regarding the purposes, roles and relations of
            source text, markup and schema.</head>
        </figure>
        <p>A final version of this paper, with the demonstrations, is available
          at <ptr target="http://piez.org/wendell/papers/dh2010/index.html"
          /></p>
      </div>

    </body>
    <back>
      <div>
        <listBibl>
          <bibl> <author>Birnbaum, David</author> <title level="a">In Defense of
            Invalid SGML</title> <title level="m" type="proceedings"
            >ACH/ALLC</title> <name type="venue">Kingston, Ontario</name> <date
            type="conference">1997</date> </bibl>
          <bibl> <author>Buzzetti, Dino</author> <date>2002</date> <title
            level="a">Digital Representation and the Text Model</title> <title
            level="j">New Literary History</title> <biblScope type="issue"
            >33(1)</biblScope> <biblScope type="pp">61-88</biblScope> </bibl>
          <bibl> <title level="m">CATMA. University of Hamburg (Jan Christoph
            Meister)</title> <ptr
            target="http://www.jcmeister.de/html/catma-e.html"/> </bibl>
          <bibl> <author>Caton, Paul</author> <title level="a">LMNL Matters?
            (presenting the Limner prototype tagging platform)</title> <title
            level="m" type="proceedings">Extreme Markup Languages 2005</title>
            <name type="venue">Montr&#233;al, Qu&#233;bec</name> <date
            type="conference">2005</date> </bibl>
          <bibl> <author>Czmiel, Alexander</author> <title level="a">XfOS (XML
            for Overlapping Structures)</title> <title level="m"
            type="proceedings">ACH/ALLC 2004</title> <name type="venue"
            >G&#246;teborg, Sweden</name> <date type="conference">2004</date> </bibl>
          <bibl> <author>Di Iorio, Angelo, Silvio Peroni and Fabio
            Vitali</author> <title level="a">Towards markup support for full
            GODDAGs and beyond: the EARMARK approach</title> <title level="m"
            type="proceedings">Balisage: The Markup Conference 2009</title>
            <name type="venue">Montr&#233;al, Canada</name> <date
            type="conference">2009</date> <ptr
            target="http://www.balisage.net/Proceedings/vol3/html/Peroni01/BalisageVol3-Peroni01.html"
            /> </bibl>
          <bibl> <author>Durusau, Patrick, and Matthew Brooke
            O&apos;Donnell</author> <title level="a">Coming down from the trees:
            Next step in the evolution of markup? (Presenting JITTs,
            Just-in-time Trees)</title> <title level="m" type="proceedings"
            >Extreme Markup Languages 2002</title> <name type="venue"
            >Montr&#233;al, Qu&#233;bec</name> <date type="conference"
            >2002</date> <ptr
            target="http://www.durusau.net/publications/NY_xml_sig.pdf"/> </bibl>
          <bibl> <author>Huitfeldt, Claus</author> <title level="m">Markup
            Languages for Complex Documents (MLCD)</title> <ptr
            target="http://decentius.aksis.uib.no/mlcd/en.htm"/> </bibl>
          <bibl> <title level="m">Image Markup Tool. University of Victoria
            (Martin Holmes)</title> <ptr
            target="http://tapor.uvic.ca/~mholmes/image_markup/"/> </bibl>
          <bibl> <author>Lancashire, Ian</author> <date>1995</date> <title
            level="m">Early Books, RET Encoding Guidelines, and the Trouble with
            SGML</title> <ptr
            target="http://www.ucalgary.ca/~scriptor/papers/lanc.html"/> </bibl>
          <bibl> <title level="m">LMNL wiki</title> <ptr
            target="http://www.lmnl.org/wiki/index.php/Main_Page"/> </bibl>
          <bibl> <author>McGann, Jerome</author> <date>2004</date> <title
            level="a">Marking Texts of Many Dimensions</title> <title level="m"
            >A Companion to Digital Humanities</title> <editor>Susan Schreibman,
            Ray Siemens, John Unsworth</editor> <publisher>Blackwell</publisher>
            <pubPlace>Oxford</pubPlace> </bibl>
          <bibl> <title level="m">NINES project. University of Virginia (Jerome
            McGann, et al.)</title> <ptr target="http://www.nines.org"/> </bibl>
          <bibl> <author>Piez, Wendell</author> <date>2001</date> <title
            level="a">Beyond the Descriptive vs Procedural Distinction</title>
            <title level="j">Markup Languages: Theory and Practice</title>
            <biblScope type="issue">3(2)</biblScope> <ptr
            target="http://www.piez.org/wendell/papers/beyonddistinction.pdf"/> </bibl>
          <bibl> <author>Ramsay, Steven</author> <date>2008</date> <title
            level="a">Algorithmic Criticism</title> <title level="m">A Companion
            to Digital Literary Studies</title> <editor>Susan Schreibman and Ray
            Siemens</editor> <publisher>Blackwell</publisher>
            <pubPlace>Oxford</pubPlace> <ptr
            target="http://www.digitalhumanities.org/companionDLS/"/> </bibl>
          <bibl> <author>Sperberg-McQueen, C. Michael</author> <date>1991</date>
            <title level="a">Text in the electronic age: Textual study and text
            encoding, with examples from medieval texts</title> <title level="j"
            >Literary &amp; linguistic computing</title> <biblScope type="issue"
            >6(1)</biblScope> <biblScope type="pp">34-46</biblScope> </bibl>
        </listBibl>
      </div>
    </back>
  </text>
</TEI>

