Summary of New HyTime Features in the TC

All,

As a service to those of you who either don't have access to the TC still
under development or who, like David, don't have the time to read it
(understandably--I barely have time to both work on the TC and XML), what
follows is a brief summary of the things that are new or different in
HyTime as a result of the TC (not counting all the new stuff in the SGML
Extended Facilities).  We have gone to a great deal of effort to ensure
that HyTime is as flexible as possible, in part with an eye toward how it
can be applied to Internet problems.  The development of the grove
formalism has made HyTime much more rigorous in its definitions and, as a
side effect, potentially much more inclusive that it was before with a
minimum of hand waving (something Steve DeRose and David Durrand very
rightly objected to in the original HyTime standard).  

While I'm sure that many of these new features are more than XML needs as
its minimum, I wanted to show everything so you know what's there that's
new and interesting.

Summary of HyTime TC Changes:

1. Grove-based definitions.  All HyTime addressing functions are now
   defined as selections of nodes in groves.  This means several 
   important things:

   A. HyTime location addresses can be used to address any kind of 
      data as long as you can produce a grove for it.
   B. Non-HyTime addressing schemes can be integrated with HyTime
      by defining the node lists they select.
   C. All addressing operations are defined rigorously in terms
      of groves which are in turn defined rigorously in terms
      of formal property sets.  Thus, if we've done our job right,
      there should be no ambiguitity about what a particular 
      location address will return if you know the property set and
      grove plan in effect.

2. The use of queries for addressing.  Queries are now meaningfully
   integrated with other addressing mechanisms as long as the 
   queries return nodes in a grove.  What this means in practice
   is that your query system must provide a grove-based
   API for defining the query domain and returing the results.  This
   should not be hard in most cases, as you presumably already have
   a definition of the data model and need only translate it into
   the SGML property set formalism.  

   This means that any non-HyTime addressing scheme can be used in
   HyTime by defining it's grove-based results.  E.g., TEI locators
   and URLs become queries that return nodes in groves.  TEI locators
   return nodes in SGML document groves (or whatever else they are
   designed to address, URLs return nodes that are
   either entity nodes (representing documents or data entities) or
   element nodes (named A objects) within documents.

3. The new "refloc" facility.  This lets you designate attributes
   as semantic references with which you can then use any addressing
   method you want, including queries, without using indirection.
   All the HyTime engine cares about is that the attribute is referential,
   it doesn't care how the reference is made as long as it gets a node
   list as a result of interpreting the reference (which it would do in
   the case of HyTime-defined addresses or the query processor would
   do for non-HyTime addresses).

   Refloc makes it possible to directly represent things like the
   HTML A element without change to instances. It essentially lets
   you apply HyTime to almost any rational way of doing addressing
   from elements.  If you can do clever querries, you should be able
   to retrofit even the most twisted markup schemes.

   Refloc replaces the need for the notorious "not-clink" that was in
   the HyTime TC design for a while.

4. Redesigned independent links: hylink

   The new HyLink form enhances the ilink design as follows:

   A. single "linkends" attribute replaced by one referential attribute
      for each named anchor role.  This makes the markup much clearer
      to users and enables the use of refloc to have different 
      addressing methods for different anchors if you want.  It also
      lets you do things like use refloc to fix a referential attribute
      to a treeloc that addresses a location address in the body of 
      link or that addresses something in the link as one of the
      anchors.

   B. Notion of "aggregate links" replaced by notion of "list anchors".
      The whole aggloc/agglink business on location addresses is gone
      (replaced by the new agglink form).  List anchors are anchors that
      can be satisfied by more than one object.  You can define traversal
      rules for the members of a list ("list traversal"). (Agglink lets
      you do semanticless aggregation if you want to.)

   C. Can explicitly designate any anchor as being satisfied by the 
      link element itself (as "self anchor").  This replaces the
      "omitted linkend" defaulting rule of ilink.

   D. New and improved link traversal rules.  More complete and 
      easier to understand (I hope).

   Any existing ilink can be made into a HyLink by adding attributes
   for addressing the anchor roles ("anchor-addressing attributes")
   and, possibly, modifying the anchrole attribute to reflect the new
   keywords (specifically, #AGG replaced by #LIST, #COR replaced by
   #CORLIST).

5. New agglink form for representing simple aggregation.  Represents the 
   case of simply grouping a bunch of things together under a single
   semantic label to enable traversal among them.  Agglinks can be 
   combined with hylinks to represent arbitrarily-complex levels of
   grouping.  For example, to represent the relationship between a mother
   and her children, you might have a hylink with the anchor roles
   "mother" and "children". The children anchor might then be satisfied
   by agglink elements of the types "brothers" and "sisters".  These
   agglinks would then address as aggregates the male children and the
   female children.  Or you could just represent the aggregation 
   relationship "brothers" by a brothers agglink by itself--you don't have
   to do something silly like define anchors roles "bother1 brother2 ...".

   Agglink is also useful when you really don't care about the anchor
   roles and just want to connect a bunch of stuff together.  In other
   words you could have an agglink called "agglink" and leave it at that.

6. Clink unchanged on the surface but now derived directly from hylink
   with fixed anchor roles, so it picks up all the hylink attributes
   like link traversal and list traversal that it never had before.

7. Implied location sources: any location address element can omit its
   location source, in which case it is taken to be one of three
   possible things: the grove root, the principle tree root in the grove
   (i.e., the document element in SGML documents), or the "referrer", i.e.,
   the non-address element that refers to the location address (or to
   the location ladder of which the location address is the top rung).
   These options can be fixed for a type or selected on an instance
   basis.

   The last option, "referrer", lets you have location addresses that
   are dynamic in that they address relative to the thing that uses them.
   This lets you do things like a "next" location ladder, similar to the
   "next" and "previous" options in TEI locators. (Now if we could just
   declare elements with fixed IDs we'd be in good shape....)

8. Generalized name-space location address.  With groves, the notion of
   name-space is well defined.  This means we can now address by name in
   any name space, not just ID or entity name.  For example, in a grove
   for a vector graphic format that let you put names on graphic objects
   (as you can do in CGM Level 4), you could address those graphic objects
   by name using the HyTime name-space location address.

   The default name space is still the element name space (the "elements"
   property of the SGMLDOC object in the SGML property set), so nmsploc
   also provides a simpler syntax for doing what you use nameloc for
   today, e.g.:  <nmsploc locsrc=other-document>chapter-1</nmsploc>.

9. Location sources can be data or subdocument entities.  In the above
   nmsploc example, the locsrc attribute is declared as ENTITY and its
   value is the name of a document entity (a CDATA entity with a notation
   of SGML or a notation derived from SGML, e.g., XML).

10. New location address form: mixedloc.  Mixed loc addresses whatever
    the location addresses in its content address.  This lets you 
    group location addresses together within a mixedloc element
    and then use the lot by referring to the mixedloc.  Mixedloc was
    invented in part to model the old nameloc, but it should be useful
    in its own right.

11. New location address form: queryloc.  Queryloc is a location address
    that uses a query and returns a node list.  It is through queryloc
    that other addressing notations can be integrated.

    Queryloc replaces notloc: it's simply not meaningful to use a
    location address that doesn't return nodes in a grove, so notloc,
    which was defined as doing something HyTime didn't understand,
    is just not meaningful in the grove new world.  At a minimum, you 
    return a data entity node where the content of entity is 
    opaque to HyTime (just as it would be if it were a bitmap image
    or some other unstructured data).

    Note that things like defining hotspots in graphics would be done
    as a queryloc, not as an FCSLoc, which has changed.  In other words,
    you could, if you wanted to, define the various image mapping 
    notations (ISMAP, client-side maps) as query notations and then
    just use them.  All you'd need to define would be a simple property
    set representing the data objects and properties the image describes,
    e.g., selectable regions, their associated addresses, and so on.

12. New and improved FCSLoc:  FCSLoc now actually addresses objects in
    finite coordinate spaces (rather than imposing FCSes onto objects, 
    which was, frankly, bogus).  An FCSLoc essentially applies a
    "marqui" selection bounding area to an FCSloc and addresses any
    events inside that bounding area (as determined by the selection
    precision set for the FCSLoc).  The FCSloc returns either the
    events or the objects scheduled by the events.  

    FCSLoc lets you hyperlink directly to whatever happens to occur
    in a region of an FCS.  For example, if you had an event schedule
    that represented a historical time line, you could use an FCS to
    link to "1945" and therefore anything that occurs within 1945.
    You could then use that FCSLoc as the location source for a queryloc
    that selected particular nodes, say all element nodes whose
    subject attribute value was "art:socialist" to see what Soviet
    artists were doing in 1945.  

13. New and improved dataloc: The old dataloc was somewhat bogus in that
    the definition of what it addressed wasn't very rigorous.  We've fixed
    that by providing a generic "data tokenizer" mechanism.  A data 
    tokenizer is any process that parses character data and constructs
    a grove containing the resulting tokens.  You can then address
    the tokens as node list using normal listloc.  Thus you can use any
    tokenization method you want (for example, you might have a tokenizer
    that does grammatical analysis or one that knows how to tokenize
    some obscure national language).  Dataloc combines a default tokenizer
    with a listloc to remove the need to explicitly specify the data
    tokenizer.  The tokenization "filters" are any HyLex lexical types
    available to the document (we define a small set of "built-in"
    filters that systems can support without having to support full
    HyLex.

    Thus we have a general data tokenization mechanism and the same old
    dataloc (with a couple new options, like a "LINE" token) with
    no more hand waving or bogosity.

I realize that many of these new features are beyond the scope of Tim's
minimal requirements and I'm *NOT* suggesting that we necessarily use them
in XML.  I just wanted everyone to know what new things HyTime offers,
because I think they're pretty exciting and could be useful to a lot of you.

Cheers,

E.
--
W. Eliot Kimber (eliot@isogen.com) 
Senior SGML Consulting Engineer, Highland Consulting
2200 North Lamar Street, Suite 230, Dallas, Texas 75202
+1-214-953-0004 +1-214-953-3152 fax
http://www.isogen.com (work) http://www.drmacro.com (home)
"Rats in the morning, rats in the afternoon...if they don't go away, I'll be
re-educated soon..."                 --Austin Lounge Lizards, "1984 Blues"

Received on Friday, 20 December 1996 13:30:31 UTC