W3C home > Mailing lists > Public > w3c-sgml-wg@w3.org > December 1996

Re: Summary of New HyTime Features in the TC

From: Charles F. Goldfarb <Charles@SGMLsource.com>
Date: Mon, 23 Dec 1996 15:23:00 GMT
To: "W. Eliot Kimber" <eliot@isogen.com>
Cc: w3c-sgml-wg@www10.w3.org
Message-ID: <32c59dff.14349740@mail.alink.net>

This is great stuff and deserves wider distribution. I particularly like the
application examples.

I would soft-pedal knocking the existing design; why should they trust you this
time if you blew it last time. In any case, 20-20 hindsight is easy and things
weren't as clear four years ago; all those designs had good reason for being.

My detailed comments are interspersed below.

In the future, could you please copy me explicitly on anything that you think I
should see on the XML list? I can't monitor it for the next several weeks. (Just
send me stuff like this, and anything you think I should comment on, either to
you or to the list.)



On Fri, 20 Dec 1996 11:28:52 -0900, "W. Eliot Kimber" <eliot@isogen.com> wrote:

>As a service to those of you who either don't have access to the TC still
>under development or who, like David, don't have the time to read it
>(understandably--I barely have time to both work on the TC and XML), what
>follows is a brief summary of the things that are new or different in
>HyTime as a result of the TC (not counting all the new stuff in the SGML
>Extended Facilities).  We have gone to a great deal of effort to ensure
>that HyTime is as flexible as possible, in part with an eye toward how it
>can be applied to Internet problems.  The development of the grove
>formalism has made HyTime much more rigorous in its definitions and, as a
>side effect, potentially much more inclusive that it was before with a
>minimum of hand waving (something Steve DeRose and David Durrand very
>rightly objected to in the original HyTime standard).  
>While I'm sure that many of these new features are more than XML needs as
>its minimum, I wanted to show everything so you know what's there that's
>new and interesting.
>Summary of HyTime TC Changes:
>1. Grove-based definitions.  All HyTime addressing functions are now
>   defined as selections of nodes in groves.  This means several 
>   important things:
>   A. HyTime location addresses can be used to address any kind of 
>      data as long as you can produce a grove for it.
>   B. Non-HyTime addressing schemes can be integrated with HyTime
>      by defining the node lists they select.
>   C. All addressing operations are defined rigorously in terms
>      of groves which are in turn defined rigorously in terms
>      of formal property sets.  Thus, if we've done our job right,
>      there should be no ambiguitity about what a particular 
>      location address will return if you know the property set and
>      grove plan in effect.
>2. The use of queries for addressing.  Queries are now meaningfully
>   integrated with other addressing mechanisms as long as the 
>   queries return nodes in a grove.  What this means in practice
>   is that your query system must provide a grove-based
>   API for defining the query domain and returing the results.  This
>   should not be hard in most cases, as you presumably already have
>   a definition of the data model and need only translate it into
>   the SGML property set formalism.  
>   This means that any non-HyTime addressing scheme can be used in
>   HyTime by defining it's grove-based results.  E.g., TEI locators
>   and URLs become queries that return nodes in groves.  TEI locators
>   return nodes in SGML document groves (or whatever else they are
>   designed to address, URLs return nodes that are
>   either entity nodes (representing documents or data entities) or
>   element nodes (named A objects) within documents.
>3. The new "refloc" facility.  This lets you designate attributes
>   as semantic references with which you can then use any addressing
>   method you want, including queries, without using indirection.

Without *explicitly* using indirection. HyTime still has to be given the same
information as it would for a queryloc, etc.

>   All the HyTime engine cares about is that the attribute is referential,
>   it doesn't care how the reference is made as long as it gets a node
>   list as a result of interpreting the reference (which it would do in
>   the case of HyTime-defined addresses or the query processor would
>   do for non-HyTime addresses).

It needs to know the notation of the query or it can't invoke the processor for
it. Don't pretend that this isn't a location address element; it is a shortcut
for one.

>   Refloc makes it possible to directly represent things like the
>   HTML A element without change to instances. It essentially lets
>   you apply HyTime to almost any rational way of doing addressing
>   from elements.  If you can do clever querries, you should be able
>   to retrofit even the most twisted markup schemes.
>   Refloc replaces the need for the notorious "not-clink" that was in
>   the HyTime TC design for a while.
>4. Redesigned independent links: hylink
>   The new HyLink form enhances the ilink design as follows:
>   A. single "linkends" attribute replaced by one referential attribute
>      for each named anchor role.  This makes the markup much clearer
>      to users and enables the use of refloc to have different 
>      addressing methods for different anchors if you want.  It also
>      lets you do things like use refloc to fix a referential attribute
>      to a treeloc that addresses a location address in the body of 
>      link or that addresses something in the link as one of the
>      anchors.
>   B. Notion of "aggregate links" replaced by notion of "list anchors".
>      The whole aggloc/agglink business on location addresses is gone
>      (replaced by the new agglink form).  List anchors are anchors that
>      can be satisfied by more than one object.  You can define traversal

Please delete "satisfied by" everywhere in this summary. It is incorrect and
unnecessary. "List anchors are anchors that can be more than one object."

>      rules for the members of a list ("list traversal"). (Agglink lets
>      you do semanticless aggregation if you want to.)
>   C. Can explicitly designate any anchor as being satisfied by the 
>      link element itself (as "self anchor").  This replaces the
>      "omitted linkend" defaulting rule of ilink.
>   D. New and improved link traversal rules.  More complete and 
>      easier to understand (I hope).
>   Any existing ilink can be made into a HyLink by adding attributes
>   for addressing the anchor roles ("anchor-addressing attributes")
>   and, possibly, modifying the anchrole attribute to reflect the new
>   keywords (specifically, #AGG replaced by #LIST, #COR replaced by
>   #CORLIST).
>5. New agglink form for representing simple aggregation.  Represents the 
>   case of simply grouping a bunch of things together under a single
>   semantic label to enable traversal among them.  Agglinks can be 
>   combined with hylinks to represent arbitrarily-complex levels of
>   grouping.  For example, to represent the relationship between a mother
>   and her children, you might have a hylink with the anchor roles
>   "mother" and "children". The children anchor might then be satisfied
>   by agglink elements of the types "brothers" and "sisters".  These
>   agglinks would then address as aggregates the male children and the
>   female children.  Or you could just represent the aggregation 
>   relationship "brothers" by a brothers agglink by itself--you don't have
>   to do something silly like define anchors roles "bother1 brother2 ...".
>   Agglink is also useful when you really don't care about the anchor
>   roles and just want to connect a bunch of stuff together.  In other
>   words you could have an agglink called "agglink" and leave it at that.
>6. Clink unchanged on the surface but now derived directly from hylink
>   with fixed anchor roles, so it picks up all the hylink attributes
>   like link traversal and list traversal that it never had before.
>7. Implied location sources: any location address element can omit its
>   location source, in which case it is taken to be one of three
>   possible things: the grove root, the principle tree root in the grove
>   (i.e., the document element in SGML documents), or the "referrer", i.e.,
>   the non-address element that refers to the location address (or to
>   the location ladder of which the location address is the top rung).

the location *path) of which the location address is the first step).

>   These options can be fixed for a type or selected on an instance
>   basis.
>   The last option, "referrer", lets you have location addresses that
>   are dynamic in that they address relative to the thing that uses them.
>   This lets you do things like a "next" location ladder, similar to the
>   "next" and "previous" options in TEI locators. (Now if we could just
>   declare elements with fixed IDs we'd be in good shape....)
>8. Generalized name-space location address.  With groves, the notion of
>   name-space is well defined.  This means we can now address by name in
>   any name space, not just ID or entity name.  For example, in a grove
>   for a vector graphic format that let you put names on graphic objects
>   (as you can do in CGM Level 4), you could address those graphic objects
>   by name using the HyTime name-space location address.
>   The default name space is still the element name space (the "elements"
>   property of the SGMLDOC object in the SGML property set), so nmsploc
>   also provides a simpler syntax for doing what you use nameloc for
>   today, e.g.:  <nmsploc locsrc=other-document>chapter-1</nmsploc>.
>9. Location sources can be data or subdocument entities.  In the above
>   nmsploc example, the locsrc attribute is declared as ENTITY and its
>   value is the name of a document entity (a CDATA entity with a notation
>   of SGML or a notation derived from SGML, e.g., XML).

This is a shorthand that causes construction of the necessary grove.

(Avoid giving the impression that we have a whole bunch of special cases when we
really have some well-designed shortcuts to a single construct.)

>10. New location address form: mixedloc.  Mixed loc addresses whatever
>    the location addresses in its content address.  This lets you 
>    group location addresses together within a mixedloc element
>    and then use the lot by referring to the mixedloc.  Mixedloc was
>    invented in part to model the old nameloc, but it should be useful
>    in its own right.
>11. New location address form: queryloc.  Queryloc is a location address
>    that uses a query and returns a node list.  It is through queryloc
>    that other addressing notations can be integrated.
>    Queryloc replaces notloc: it's simply not meaningful to use a
>    location address that doesn't return nodes in a grove, so notloc,
>    which was defined as doing something HyTime didn't understand,
>    is just not meaningful in the grove new world.  At a minimum, you 
>    return a data entity node where the content of entity is 
>    opaque to HyTime (just as it would be if it were a bitmap image
>    or some other unstructured data).
>    Note that things like defining hotspots in graphics would be done
>    as a queryloc, not as an FCSLoc, which has changed.  In other words,
>    you could, if you wanted to, define the various image mapping 
>    notations (ISMAP, client-side maps) as query notations and then
>    just use them.  All you'd need to define would be a simple property
>    set representing the data objects and properties the image describes,
>    e.g., selectable regions, their associated addresses, and so on.
>12. New and improved FCSLoc:  FCSLoc now actually addresses objects in
>    finite coordinate spaces (rather than imposing FCSes onto objects, 
>    which was, frankly, bogus).  An FCSLoc essentially applies a
>    "marqui" selection bounding area to an FCSloc and addresses any
>    events inside that bounding area (as determined by the selection
>    precision set for the FCSLoc).  The FCSloc returns either the
>    events or the objects scheduled by the events.  
>    FCSLoc lets you hyperlink directly to whatever happens to occur
>    in a region of an FCS.  For example, if you had an event schedule
>    that represented a historical time line, you could use an FCS to
>    link to "1945" and therefore anything that occurs within 1945.
>    You could then use that FCSLoc as the location source for a queryloc
>    that selected particular nodes, say all element nodes whose
>    subject attribute value was "art:socialist" to see what Soviet
>    artists were doing in 1945.  
>13. New and improved dataloc: The old dataloc was somewhat bogus in that
>    the definition of what it addressed wasn't very rigorous.  We've fixed
>    that by providing a generic "data tokenizer" mechanism.  A data 
>    tokenizer is any process that parses character data and constructs
>    a grove containing the resulting tokens.  You can then address
>    the tokens as node list using normal listloc.  Thus you can use any
>    tokenization method you want (for example, you might have a tokenizer
>    that does grammatical analysis or one that knows how to tokenize
>    some obscure national language).  Dataloc combines a default tokenizer
>    with a listloc to remove the need to explicitly specify the data
>    tokenizer.  The tokenization "filters" are any HyLex lexical types
>    available to the document (we define a small set of "built-in"
>    filters that systems can support without having to support full
>    HyLex.
>    Thus we have a general data tokenization mechanism and the same old
>    dataloc (with a couple new options, like a "LINE" token) with
>    no more hand waving or bogosity.
>I realize that many of these new features are beyond the scope of Tim's
>minimal requirements and I'm *NOT* suggesting that we necessarily use them
>in XML.  I just wanted everyone to know what new things HyTime offers,
>because I think they're pretty exciting and could be useful to a lot of you.
>W. Eliot Kimber (eliot@isogen.com) 
>Senior SGML Consulting Engineer, Highland Consulting
>2200 North Lamar Street, Suite 230, Dallas, Texas 75202
>+1-214-953-0004 +1-214-953-3152 fax
>http://www.isogen.com (work) http://www.drmacro.com (home)
>"Rats in the morning, rats in the afternoon...if they don't go away, I'll be
>re-educated soon..."                 --Austin Lounge Lizards, "1984 Blues"

Charles F. Goldfarb * Information Management Consulting * +1(408)867-5553
           13075 Paramount Drive * Saratoga CA 95070 * USA
  International Standards Editor * ISO 8879 SGML * ISO/IEC 10744 HyTime
 Prentice-Hall Series Editor * CFG Series on Open Information Management
Received on Monday, 23 December 1996 10:24:46 EST

This archive was generated by hypermail pre-2.1.9 : Wednesday, 24 September 2003 10:03:49 EDT