HyTime and TEI location ladder examples

These examples are taken from comments submitted by the TEI and some
national bodies to the HyTime committee at the time of the Draft standard
(I've updated them to match the HyTime final form, and hope I haven't let
any typos in). I'm posting the examples because we've had *so* much
discussion of TEI and HyTime location-specifying methods, but almost no
concrete examples.

*** The Example ***

Task: find the destination by traversing in this order:

a)   the element with ID attribute "a25",

b)   within that element, the second child of type "CHAP" (assumed to also
be the third child, due to a title element),

c)   within that element, the fourth child of type "SEC" (assumed to also be
the fifth child, due to a title element),

d)   within that element, the fourth child of type "P" (assumed to also be
the fourth child regardless of type),

e)   within that element, the 5th word-token.


*** Old (P1) locator form ***

This is obsolete, but is often quoted (some software supports it for
backward compatibility). See Section 5.7.3 of the (1990) draft TEI Guidelines. 

   ID a25 TPATH CHAP=2/SEC=4/P=4 TOKENS 5

*** Current (P3) locator form ***

The final TEI guidelines ("P3", 1994), changed the syntax slightly and added
features (to step sideways, filter by attributes, do regex matching
("DIV[3-5]"), and so on):

   ID (a25) CHILD (2 CHAP) (4 SEC) (4 P) TOKEN (5)

Values like this are called "extended pointers" ("extended" because they can
point to more than just an ID). They operate on SGML abstract syntax trees,
which DSSSL and HyTime call part of groves. They address pcdata chunks the
same way HyTime does (this differs slightly from DSSSL). Neither HyTime nor
TEI said how PI or comments were addressed, though the intent was that TEI
and HyTime work the same way.

Doing the steps by element type is preferable to just using child numbers
because you are more likely to catch errors (there is less likely to be a
4th SEC if you are in the wrong place, than just a 4th element of any type).
But the GI parameter is not required, you could just have:

   ID (a25) CHILD (3) (5) (4) TOKEN (5)

That's 37 bytes for the more robust form or 29 for the unfiltered form, and
no values added to the ID name space. HyTime does not filter steps by GI or
attribute value unless you resort to queries (now via HyQ, but after the TC,
via DSSSL queries), so only the short form above has a direct HyTime equivalent.

An equivalent HyTime link would point to a location ladder something like
this (first, the almost-fully explicit HyTime form -- I've left out some of
the more obscure attributes and any unnecessary recursion):

<nameloc id="step1" hytime="nameloc" docorsub="">
   <nmlist nametype="element"> a25 </nmlist>
</nameloc>
<treeloc id="step2" locsrc="step1" 
         lextype="locsrc (IDREF, (s*, IDREF)+)"
         reflevel="1" hytime="treeloc">
   <marklist HyTime="marklist">3  5 4</marklist>
</treeloc>
<dataloc id="step3" locsrc="step2" 
      hytime="dataloc" quantum="word">
   <dimlist HyTime="dimlist">5 1</dimlist>
</dataloc>


*** Here's the HyTime form, about as compact as I can see how to get it
using lots of attribute defaulting and SHORTTAG:

<nameloc id="step1" docorsub="">
   <nmlist>a25 </></>
<treeloc id="step2" locsrc="step1">
   <marklist>3  5 4</></>
<dataloc id="step3" locsrc="step2">
   <dimlist>5 1</></>

That's 135 characters (about 330 for the unminimized form) and 3 IDs.

An advantage of splitting the rungs of location ladders into separate
elements a la HyTime may be that you may be able to re-use portions.
However, you can do that if you want in TEI since indirection is supported;
and you can do it in plain SGML by judicious use of entities.

The HyTime TC's new construct for "clink that uses non-IDREF notation to
specify the destination" can serve as an escape hatch to TEI. This would
make the whole link shorter. You get HyTime compatibility, but also TEI
locator functionality, compactness, name space clearance, type-checking, and
more existing implementations. 

*** From locators to links *** 

Extended pointers are typically placed in attributes on link elements. As
one example, the P3 equivalent of HyTime's clink is the xref element, which
takes an extended pointer. The pointer can be as brief as 

   <xref target="ID (a25)">

or as complicated as necessary (bounded and unbounded indirection are also
allowed, and there is a separate form for bare IDREFs). 

Out-of-line links in TEI are very like HyTime ilinks: they take a "targets"
IDREFS attribute to point to a set of anchors (and will nearly always point
to extended pointers in practice). Link collections are linkgrp elements,
which are aggregates of links.

That's a taste of the syntaxes we're discussing. As previously cited, a URL
for info on TEI extended pointers is
http://dynaweb.ebt.com:8080/usrbooks/teip3/26759 (I'd put a period there but
Eudora would think it was part of the URL and not let you click your way to it).

Lou Burnard has written an excellent tutorial on TEI locators, found at
http://users.ox.ac.uk/~lou/wip/XR.htm 

Steve

Received on Tuesday, 14 January 1997 14:49:59 UTC