- From: W. Eliot Kimber <eliot@isogen.com>
- Date: Fri, 20 Dec 1996 11:28:52 -0900
- To: w3c-sgml-wg@w3.org
All, As a service to those of you who either don't have access to the TC still under development or who, like David, don't have the time to read it (understandably--I barely have time to both work on the TC and XML), what follows is a brief summary of the things that are new or different in HyTime as a result of the TC (not counting all the new stuff in the SGML Extended Facilities). We have gone to a great deal of effort to ensure that HyTime is as flexible as possible, in part with an eye toward how it can be applied to Internet problems. The development of the grove formalism has made HyTime much more rigorous in its definitions and, as a side effect, potentially much more inclusive that it was before with a minimum of hand waving (something Steve DeRose and David Durrand very rightly objected to in the original HyTime standard). While I'm sure that many of these new features are more than XML needs as its minimum, I wanted to show everything so you know what's there that's new and interesting. Summary of HyTime TC Changes: 1. Grove-based definitions. All HyTime addressing functions are now defined as selections of nodes in groves. This means several important things: A. HyTime location addresses can be used to address any kind of data as long as you can produce a grove for it. B. Non-HyTime addressing schemes can be integrated with HyTime by defining the node lists they select. C. All addressing operations are defined rigorously in terms of groves which are in turn defined rigorously in terms of formal property sets. Thus, if we've done our job right, there should be no ambiguitity about what a particular location address will return if you know the property set and grove plan in effect. 2. The use of queries for addressing. Queries are now meaningfully integrated with other addressing mechanisms as long as the queries return nodes in a grove. What this means in practice is that your query system must provide a grove-based API for defining the query domain and returing the results. This should not be hard in most cases, as you presumably already have a definition of the data model and need only translate it into the SGML property set formalism. This means that any non-HyTime addressing scheme can be used in HyTime by defining it's grove-based results. E.g., TEI locators and URLs become queries that return nodes in groves. TEI locators return nodes in SGML document groves (or whatever else they are designed to address, URLs return nodes that are either entity nodes (representing documents or data entities) or element nodes (named A objects) within documents. 3. The new "refloc" facility. This lets you designate attributes as semantic references with which you can then use any addressing method you want, including queries, without using indirection. All the HyTime engine cares about is that the attribute is referential, it doesn't care how the reference is made as long as it gets a node list as a result of interpreting the reference (which it would do in the case of HyTime-defined addresses or the query processor would do for non-HyTime addresses). Refloc makes it possible to directly represent things like the HTML A element without change to instances. It essentially lets you apply HyTime to almost any rational way of doing addressing from elements. If you can do clever querries, you should be able to retrofit even the most twisted markup schemes. Refloc replaces the need for the notorious "not-clink" that was in the HyTime TC design for a while. 4. Redesigned independent links: hylink The new HyLink form enhances the ilink design as follows: A. single "linkends" attribute replaced by one referential attribute for each named anchor role. This makes the markup much clearer to users and enables the use of refloc to have different addressing methods for different anchors if you want. It also lets you do things like use refloc to fix a referential attribute to a treeloc that addresses a location address in the body of link or that addresses something in the link as one of the anchors. B. Notion of "aggregate links" replaced by notion of "list anchors". The whole aggloc/agglink business on location addresses is gone (replaced by the new agglink form). List anchors are anchors that can be satisfied by more than one object. You can define traversal rules for the members of a list ("list traversal"). (Agglink lets you do semanticless aggregation if you want to.) C. Can explicitly designate any anchor as being satisfied by the link element itself (as "self anchor"). This replaces the "omitted linkend" defaulting rule of ilink. D. New and improved link traversal rules. More complete and easier to understand (I hope). Any existing ilink can be made into a HyLink by adding attributes for addressing the anchor roles ("anchor-addressing attributes") and, possibly, modifying the anchrole attribute to reflect the new keywords (specifically, #AGG replaced by #LIST, #COR replaced by #CORLIST). 5. New agglink form for representing simple aggregation. Represents the case of simply grouping a bunch of things together under a single semantic label to enable traversal among them. Agglinks can be combined with hylinks to represent arbitrarily-complex levels of grouping. For example, to represent the relationship between a mother and her children, you might have a hylink with the anchor roles "mother" and "children". The children anchor might then be satisfied by agglink elements of the types "brothers" and "sisters". These agglinks would then address as aggregates the male children and the female children. Or you could just represent the aggregation relationship "brothers" by a brothers agglink by itself--you don't have to do something silly like define anchors roles "bother1 brother2 ...". Agglink is also useful when you really don't care about the anchor roles and just want to connect a bunch of stuff together. In other words you could have an agglink called "agglink" and leave it at that. 6. Clink unchanged on the surface but now derived directly from hylink with fixed anchor roles, so it picks up all the hylink attributes like link traversal and list traversal that it never had before. 7. Implied location sources: any location address element can omit its location source, in which case it is taken to be one of three possible things: the grove root, the principle tree root in the grove (i.e., the document element in SGML documents), or the "referrer", i.e., the non-address element that refers to the location address (or to the location ladder of which the location address is the top rung). These options can be fixed for a type or selected on an instance basis. The last option, "referrer", lets you have location addresses that are dynamic in that they address relative to the thing that uses them. This lets you do things like a "next" location ladder, similar to the "next" and "previous" options in TEI locators. (Now if we could just declare elements with fixed IDs we'd be in good shape....) 8. Generalized name-space location address. With groves, the notion of name-space is well defined. This means we can now address by name in any name space, not just ID or entity name. For example, in a grove for a vector graphic format that let you put names on graphic objects (as you can do in CGM Level 4), you could address those graphic objects by name using the HyTime name-space location address. The default name space is still the element name space (the "elements" property of the SGMLDOC object in the SGML property set), so nmsploc also provides a simpler syntax for doing what you use nameloc for today, e.g.: <nmsploc locsrc=other-document>chapter-1</nmsploc>. 9. Location sources can be data or subdocument entities. In the above nmsploc example, the locsrc attribute is declared as ENTITY and its value is the name of a document entity (a CDATA entity with a notation of SGML or a notation derived from SGML, e.g., XML). 10. New location address form: mixedloc. Mixed loc addresses whatever the location addresses in its content address. This lets you group location addresses together within a mixedloc element and then use the lot by referring to the mixedloc. Mixedloc was invented in part to model the old nameloc, but it should be useful in its own right. 11. New location address form: queryloc. Queryloc is a location address that uses a query and returns a node list. It is through queryloc that other addressing notations can be integrated. Queryloc replaces notloc: it's simply not meaningful to use a location address that doesn't return nodes in a grove, so notloc, which was defined as doing something HyTime didn't understand, is just not meaningful in the grove new world. At a minimum, you return a data entity node where the content of entity is opaque to HyTime (just as it would be if it were a bitmap image or some other unstructured data). Note that things like defining hotspots in graphics would be done as a queryloc, not as an FCSLoc, which has changed. In other words, you could, if you wanted to, define the various image mapping notations (ISMAP, client-side maps) as query notations and then just use them. All you'd need to define would be a simple property set representing the data objects and properties the image describes, e.g., selectable regions, their associated addresses, and so on. 12. New and improved FCSLoc: FCSLoc now actually addresses objects in finite coordinate spaces (rather than imposing FCSes onto objects, which was, frankly, bogus). An FCSLoc essentially applies a "marqui" selection bounding area to an FCSloc and addresses any events inside that bounding area (as determined by the selection precision set for the FCSLoc). The FCSloc returns either the events or the objects scheduled by the events. FCSLoc lets you hyperlink directly to whatever happens to occur in a region of an FCS. For example, if you had an event schedule that represented a historical time line, you could use an FCS to link to "1945" and therefore anything that occurs within 1945. You could then use that FCSLoc as the location source for a queryloc that selected particular nodes, say all element nodes whose subject attribute value was "art:socialist" to see what Soviet artists were doing in 1945. 13. New and improved dataloc: The old dataloc was somewhat bogus in that the definition of what it addressed wasn't very rigorous. We've fixed that by providing a generic "data tokenizer" mechanism. A data tokenizer is any process that parses character data and constructs a grove containing the resulting tokens. You can then address the tokens as node list using normal listloc. Thus you can use any tokenization method you want (for example, you might have a tokenizer that does grammatical analysis or one that knows how to tokenize some obscure national language). Dataloc combines a default tokenizer with a listloc to remove the need to explicitly specify the data tokenizer. The tokenization "filters" are any HyLex lexical types available to the document (we define a small set of "built-in" filters that systems can support without having to support full HyLex. Thus we have a general data tokenization mechanism and the same old dataloc (with a couple new options, like a "LINE" token) with no more hand waving or bogosity. I realize that many of these new features are beyond the scope of Tim's minimal requirements and I'm *NOT* suggesting that we necessarily use them in XML. I just wanted everyone to know what new things HyTime offers, because I think they're pretty exciting and could be useful to a lot of you. Cheers, E. -- W. Eliot Kimber (eliot@isogen.com) Senior SGML Consulting Engineer, Highland Consulting 2200 North Lamar Street, Suite 230, Dallas, Texas 75202 +1-214-953-0004 +1-214-953-3152 fax http://www.isogen.com (work) http://www.drmacro.com (home) "Rats in the morning, rats in the afternoon...if they don't go away, I'll be re-educated soon..." --Austin Lounge Lizards, "1984 Blues"
Received on Friday, 20 December 1996 13:30:31 UTC