- From: W. Eliot Kimber <eliot@isogen.com>
- Date: Tue, 17 Dec 1996 12:00:02 -0900
- To: Derek Denny-Brown <ddb@criinc.com>, Paul Prescod <papresco@calum.csclub.uwaterloo.ca>
- Cc: w3c-sgml-wg@w3.org
At 09:33 AM 12/17/96 -0800, Derek Denny-Brown wrote: > ...I am just worried that the proposals being brought forth will break >HyTime when applied to XML. Given that I was actively involved with >drafting the forthcoming HyTime TC, it is important to me that HyTime is not >completely abandoned, when it need not be. I don't think it's a HyTime-specific issue, both because the problems are not unique to HyTime and because the use of HyTime is not dependent on how the parsing process is defined. All location addressing, whether HyTime-defined or not, operates on an abstraction of the data, not the original source data. This means that you have to choose your abstraction carefully, which is what we're really talking about in this whole RS/RE fracas. There are two levels of abstraction that we usually work with: 1. The immediate result of parsing. 2. The result of applying application-specific semantics to the results of parsing. There may be more levels of abstraction, but we haven't exposed those yet in our discussions of XML processing. Abstraction (1) is what HyTime and DSSSL call the "SGML document grove" or the "pGrove" (for parse grove). What can occur in this grove is completely defined by the SGML property set (published in the DSSSL standard and soon to be published again in the HyTime TC) and reflects simply applying the SGML parsing rules to the input document. It is roughly equivalent to "ESIS" except that the grove may be more complete and you have a formal way to say what you want to be in the grove (the "grove plan"). Abstraction (2) is what HyTime calls the "extended SGML document grove", or "epGrove". This is a new grove with HyTime-specific semantics applied. It uses the same propery set as the first but may either suppress or remove some things or may modify the content to reflect HyTime-specific semantics. Any application is free to create it's own extended document grove. XML processors will, presumably, provide their own XML-specific extended document groves to reflect XML-specific semantics (for example, that whitespace is collapsed when the -xml-space attribute is in effect). The -xml-space attribute is a good example of how this works in practice: an XML parser parses a document and creates a pGrove that contains all the data characters it found. If the document has no DTD, then this means all white space characters, not just those in what we know to be mixed content (white space that is not taken as data is held in "markup" properties, which are not, by definition of the SGML property set, content of the objects that exhibit them--thus these characters may be in the grove but they aren't part of the content of the elements in which they occur). An XML processor then operates on the pGrove to produce an XML epGrove in which the rules for -xml-space are applied, i.e., lists of white space character nodes in the content of elements where white space gets collapsed get replaced with single space character nodes. (Notice I didn't say "characters get replaced", the operations are on nodes in groves, not characters in strings, and each character is a node.) Any location addressing applied against XML documents would, presumably, be applied against the XML epGrove (or possibly a location-method-specific grove derived from the epGrove), not against the pGrove. Of course, the problem of knowing how to produce the XML epGrove consistently remains. However, having these two stages can make it clearer where the processing can happen and *why* using attributes to control it is not necessarily a hack because the attributes are *not* feeding back into the base parsing process (at least conceptually)--they are affecting the construction of application-specific groves and applications are free to use any information at their disposal to control grove construction. Note also that grove plans are not sufficient to solve this problem because grove plans only include or exclude entire classes of object or entire property values--they can't selectively exclude things: that requires a specific grove construction process. Note that there's absolutely no requirement that applications actually perform the grove constructions described above as discrete steps--most XML processors will go directly from source data to XML epGrove without first constructing the intermediate pGrove. But note also that HyTime (and DSSSL) can operate with equal ease on either grove and it could be possible to have both available and indicate which you actually want to address when doing addressing. (Whether this is practically useful or not, I wouldn't want to speculate at this point.) Finally, I'd like to point out that from a HyTime perspective (in the new grove-based world) any addressing notation that can be defined in terms of node lists selected from groves can be naturally integrated into a HyTime-based system. For example, TEI locators, whose grove-based results should be obvious given knowedge of the grove plan used, could be easily, meaningfully, and usefully used in conjunction with other HyTime-defined location addresses. Thus, it's not really useful to talk about "HyTime addressing" versus other forms of address: it's all the same stuff at its core and the problems posed by the data abstractions we're creating are the same. Thus the issue of, for example, whether we should prefer TEI locators over SDQL queries is an issue of appropriate syntax and user interface, not functionality [for what it's worth, I will probably end up prefering TEI locators over SDQL for XML use because it was specifically designed to meet the requirements we presume the main XML audience to have]. Cheers, E. -- W. Eliot Kimber (eliot@isogen.com) Senior SGML Consulting Engineer, Highland Consulting 2200 North Lamar Street, Suite 230, Dallas, Texas 75202 +1-214-953-0004 +1-214-953-3152 fax http://www.isogen.com (work) http://www.drmacro.com (home) "Rats in the morning, rats in the afternoon...if they don't go away, I'll be re-educated soon..." --Austin Lounge Lizards, "1984 Blues"
Received on Tuesday, 17 December 1996 14:09:16 UTC