- From: W. Eliot Kimber <eliot@isogen.com>
- Date: Fri, 20 Dec 1996 11:28:52 -0900
- To: w3c-sgml-wg@w3.org
All,
As a service to those of you who either don't have access to the TC still
under development or who, like David, don't have the time to read it
(understandably--I barely have time to both work on the TC and XML), what
follows is a brief summary of the things that are new or different in
HyTime as a result of the TC (not counting all the new stuff in the SGML
Extended Facilities). We have gone to a great deal of effort to ensure
that HyTime is as flexible as possible, in part with an eye toward how it
can be applied to Internet problems. The development of the grove
formalism has made HyTime much more rigorous in its definitions and, as a
side effect, potentially much more inclusive that it was before with a
minimum of hand waving (something Steve DeRose and David Durrand very
rightly objected to in the original HyTime standard).
While I'm sure that many of these new features are more than XML needs as
its minimum, I wanted to show everything so you know what's there that's
new and interesting.
Summary of HyTime TC Changes:
1. Grove-based definitions. All HyTime addressing functions are now
defined as selections of nodes in groves. This means several
important things:
A. HyTime location addresses can be used to address any kind of
data as long as you can produce a grove for it.
B. Non-HyTime addressing schemes can be integrated with HyTime
by defining the node lists they select.
C. All addressing operations are defined rigorously in terms
of groves which are in turn defined rigorously in terms
of formal property sets. Thus, if we've done our job right,
there should be no ambiguitity about what a particular
location address will return if you know the property set and
grove plan in effect.
2. The use of queries for addressing. Queries are now meaningfully
integrated with other addressing mechanisms as long as the
queries return nodes in a grove. What this means in practice
is that your query system must provide a grove-based
API for defining the query domain and returing the results. This
should not be hard in most cases, as you presumably already have
a definition of the data model and need only translate it into
the SGML property set formalism.
This means that any non-HyTime addressing scheme can be used in
HyTime by defining it's grove-based results. E.g., TEI locators
and URLs become queries that return nodes in groves. TEI locators
return nodes in SGML document groves (or whatever else they are
designed to address, URLs return nodes that are
either entity nodes (representing documents or data entities) or
element nodes (named A objects) within documents.
3. The new "refloc" facility. This lets you designate attributes
as semantic references with which you can then use any addressing
method you want, including queries, without using indirection.
All the HyTime engine cares about is that the attribute is referential,
it doesn't care how the reference is made as long as it gets a node
list as a result of interpreting the reference (which it would do in
the case of HyTime-defined addresses or the query processor would
do for non-HyTime addresses).
Refloc makes it possible to directly represent things like the
HTML A element without change to instances. It essentially lets
you apply HyTime to almost any rational way of doing addressing
from elements. If you can do clever querries, you should be able
to retrofit even the most twisted markup schemes.
Refloc replaces the need for the notorious "not-clink" that was in
the HyTime TC design for a while.
4. Redesigned independent links: hylink
The new HyLink form enhances the ilink design as follows:
A. single "linkends" attribute replaced by one referential attribute
for each named anchor role. This makes the markup much clearer
to users and enables the use of refloc to have different
addressing methods for different anchors if you want. It also
lets you do things like use refloc to fix a referential attribute
to a treeloc that addresses a location address in the body of
link or that addresses something in the link as one of the
anchors.
B. Notion of "aggregate links" replaced by notion of "list anchors".
The whole aggloc/agglink business on location addresses is gone
(replaced by the new agglink form). List anchors are anchors that
can be satisfied by more than one object. You can define traversal
rules for the members of a list ("list traversal"). (Agglink lets
you do semanticless aggregation if you want to.)
C. Can explicitly designate any anchor as being satisfied by the
link element itself (as "self anchor"). This replaces the
"omitted linkend" defaulting rule of ilink.
D. New and improved link traversal rules. More complete and
easier to understand (I hope).
Any existing ilink can be made into a HyLink by adding attributes
for addressing the anchor roles ("anchor-addressing attributes")
and, possibly, modifying the anchrole attribute to reflect the new
keywords (specifically, #AGG replaced by #LIST, #COR replaced by
#CORLIST).
5. New agglink form for representing simple aggregation. Represents the
case of simply grouping a bunch of things together under a single
semantic label to enable traversal among them. Agglinks can be
combined with hylinks to represent arbitrarily-complex levels of
grouping. For example, to represent the relationship between a mother
and her children, you might have a hylink with the anchor roles
"mother" and "children". The children anchor might then be satisfied
by agglink elements of the types "brothers" and "sisters". These
agglinks would then address as aggregates the male children and the
female children. Or you could just represent the aggregation
relationship "brothers" by a brothers agglink by itself--you don't have
to do something silly like define anchors roles "bother1 brother2 ...".
Agglink is also useful when you really don't care about the anchor
roles and just want to connect a bunch of stuff together. In other
words you could have an agglink called "agglink" and leave it at that.
6. Clink unchanged on the surface but now derived directly from hylink
with fixed anchor roles, so it picks up all the hylink attributes
like link traversal and list traversal that it never had before.
7. Implied location sources: any location address element can omit its
location source, in which case it is taken to be one of three
possible things: the grove root, the principle tree root in the grove
(i.e., the document element in SGML documents), or the "referrer", i.e.,
the non-address element that refers to the location address (or to
the location ladder of which the location address is the top rung).
These options can be fixed for a type or selected on an instance
basis.
The last option, "referrer", lets you have location addresses that
are dynamic in that they address relative to the thing that uses them.
This lets you do things like a "next" location ladder, similar to the
"next" and "previous" options in TEI locators. (Now if we could just
declare elements with fixed IDs we'd be in good shape....)
8. Generalized name-space location address. With groves, the notion of
name-space is well defined. This means we can now address by name in
any name space, not just ID or entity name. For example, in a grove
for a vector graphic format that let you put names on graphic objects
(as you can do in CGM Level 4), you could address those graphic objects
by name using the HyTime name-space location address.
The default name space is still the element name space (the "elements"
property of the SGMLDOC object in the SGML property set), so nmsploc
also provides a simpler syntax for doing what you use nameloc for
today, e.g.: <nmsploc locsrc=other-document>chapter-1</nmsploc>.
9. Location sources can be data or subdocument entities. In the above
nmsploc example, the locsrc attribute is declared as ENTITY and its
value is the name of a document entity (a CDATA entity with a notation
of SGML or a notation derived from SGML, e.g., XML).
10. New location address form: mixedloc. Mixed loc addresses whatever
the location addresses in its content address. This lets you
group location addresses together within a mixedloc element
and then use the lot by referring to the mixedloc. Mixedloc was
invented in part to model the old nameloc, but it should be useful
in its own right.
11. New location address form: queryloc. Queryloc is a location address
that uses a query and returns a node list. It is through queryloc
that other addressing notations can be integrated.
Queryloc replaces notloc: it's simply not meaningful to use a
location address that doesn't return nodes in a grove, so notloc,
which was defined as doing something HyTime didn't understand,
is just not meaningful in the grove new world. At a minimum, you
return a data entity node where the content of entity is
opaque to HyTime (just as it would be if it were a bitmap image
or some other unstructured data).
Note that things like defining hotspots in graphics would be done
as a queryloc, not as an FCSLoc, which has changed. In other words,
you could, if you wanted to, define the various image mapping
notations (ISMAP, client-side maps) as query notations and then
just use them. All you'd need to define would be a simple property
set representing the data objects and properties the image describes,
e.g., selectable regions, their associated addresses, and so on.
12. New and improved FCSLoc: FCSLoc now actually addresses objects in
finite coordinate spaces (rather than imposing FCSes onto objects,
which was, frankly, bogus). An FCSLoc essentially applies a
"marqui" selection bounding area to an FCSloc and addresses any
events inside that bounding area (as determined by the selection
precision set for the FCSLoc). The FCSloc returns either the
events or the objects scheduled by the events.
FCSLoc lets you hyperlink directly to whatever happens to occur
in a region of an FCS. For example, if you had an event schedule
that represented a historical time line, you could use an FCS to
link to "1945" and therefore anything that occurs within 1945.
You could then use that FCSLoc as the location source for a queryloc
that selected particular nodes, say all element nodes whose
subject attribute value was "art:socialist" to see what Soviet
artists were doing in 1945.
13. New and improved dataloc: The old dataloc was somewhat bogus in that
the definition of what it addressed wasn't very rigorous. We've fixed
that by providing a generic "data tokenizer" mechanism. A data
tokenizer is any process that parses character data and constructs
a grove containing the resulting tokens. You can then address
the tokens as node list using normal listloc. Thus you can use any
tokenization method you want (for example, you might have a tokenizer
that does grammatical analysis or one that knows how to tokenize
some obscure national language). Dataloc combines a default tokenizer
with a listloc to remove the need to explicitly specify the data
tokenizer. The tokenization "filters" are any HyLex lexical types
available to the document (we define a small set of "built-in"
filters that systems can support without having to support full
HyLex.
Thus we have a general data tokenization mechanism and the same old
dataloc (with a couple new options, like a "LINE" token) with
no more hand waving or bogosity.
I realize that many of these new features are beyond the scope of Tim's
minimal requirements and I'm *NOT* suggesting that we necessarily use them
in XML. I just wanted everyone to know what new things HyTime offers,
because I think they're pretty exciting and could be useful to a lot of you.
Cheers,
E.
--
W. Eliot Kimber (eliot@isogen.com)
Senior SGML Consulting Engineer, Highland Consulting
2200 North Lamar Street, Suite 230, Dallas, Texas 75202
+1-214-953-0004 +1-214-953-3152 fax
http://www.isogen.com (work) http://www.drmacro.com (home)
"Rats in the morning, rats in the afternoon...if they don't go away, I'll be
re-educated soon..." --Austin Lounge Lizards, "1984 Blues"
Received on Friday, 20 December 1996 13:30:31 UTC