- From: Robin Cover <robin@ACADCOMP.SIL.ORG>
- Date: Tue, 17 Sep 1996 09:49:52 -0500
- To: w3c-sgml-wg@w3.org
Steve DeRose and others have made a good case for removing the burden of reading a DTD from the shoulders of XML applications, especially if reading an XML DTD is to be no easier than reading an arbitrary SGML (ISO 8879) DTD. I don't know how the equation changes if XML DTDs are "easy" to parse, by comparison, but I understand the motivation. Still, I am wondering how this strategy would fit into the larger picture: (a) how XML instances will actually be processed, and (b) what dynamics will likely be set up for XML applications development if the pronouncement is made: "Rejoice at last, programmers of the world: today we are liberated from the tyranny of the SGML DTD." To elaborate on this question, I first need to ensure that I have not made a fatal mistake in assuming that "Language" in "XML" to is meant in the same way as "Language" in "SGML," viz., elliptical for "metalanguage." I have taken "eXtensible" to be the equivalent of "meta-". Further, I have assumed that the markup languages based upon XML would set up basic lexical semantics as well as can be done using semantically perspicuous GIs, attribute names, attribute values (etc.), but that XML itself would be as innocent of processing semantics just like SGML. If both assumptions are warranted, and if I have not failed to reckon with some influence from "SGML Extended Facilities" (e.g., property sets that are assumed to apply to XML in some way), then I have the following observations and questions. Observation: if XML instances are to be processed without the DTD, then it's not possible to know with certainty from an instance what the data types (declared values) are for the attributes. This -- aside from the fact that some declarations might need to be sent with the XML instance to provide information about defaulted attributes, notations, etc. In this sense, an XML document instance is weaker semantically than an SGML document with its required DTD. See below, where the attributes 'uff' or 'ko' might encode an IDREF as (HyTime) clink, but we can't be sure. So, suppose this XML instance is acquired by my Net client: <ucod>xx<kok uff=ak>xx<iuk>xx</iuk>xx<kak>xx<kcico>xx</kcico>xx</kak>xx<voc> <cbb>xx</cbb>xx<qdd ko=ak>xx</qdd>xx<koik>xx</koik>xx</voc>xx</kok>xx<cdaw> <koik>xx</koik>xx<voc>xx<qdd>xx</qdd>xx<koik>xx<qkqr>xx<riwo>xx</riwo> </qkqr>xx<riwo>xx</riwo>xx</koik>xx</voc>xx</cdaw>xx<kob>xx</kob>xx</ucod> If the client assumes (or can know from a MIME type?) that the instance is a candidate for meaningful display, what does it do? (Ignore the fact that the content represented by "xx" is dummy text). The server sending the instance will also need to send a stylesheet which says that (e.g.,) "qdd is a line-breaking element" and "qkqr is to be displayed in italic typeface" and so forth. Something also has to tell my browser that QDD's attribute 'ko' encodes a HyTime clink, and that 'uff' is (therefore) for storing an ID. Etc. Thus, I still don't understand the point of enabling "DTDless" processing of XML instances, as per Tim Bray's posting #172, where "search, display, analyze" are examples of such processing, *if* it is necessary to have and process, in addition to the instance: * a rendering stylesheet * a collection of declarations to account for attribute defaulting, notations, etc) * a set of other specification of semantics like (HyTime) reftype, clink, which express fundamental relational semantics that are not expected to be in a specific stylesheet Granted that it's nice to be able to create a parse tree for an XML instance simply by having the instance in hand: what's the point of all this economy if we can't make any useful sense of the tree without also having (and processing) the stylesheet, as well as having (and processing) the other declarations that tell us how to interpret the tree? It seems to me ironic that a revised SGML could be depicted in this way: "here's the instance, here's the stylesheet, here's a small collection of declarations to let you know about defaulted attrs and notations, and here's a set of HyTime mappings based upon archForms semantics to augment the stuff in the stylesheet -- what, you want the full XML DTD too? -- naw, sorry: irrelevant and too expensive." Someone says: (1) "No need to ship a stylesheet: the XML document can just reference one, and if it's common, it should be on the receiving system's local machine." OK: why not also for the DTD? Or: (2) "Naw, we assume a common tag set, so that we know <p> means 'paragraph' and <li> means 'list item'." OK: how then is XML much of an improvement over HTML? I still maintain that there is a broad range of applications for which *having the DTD* (as opposed to the parse tree for the exact XML instance that just arrived), or for which *having the option of addressing the DTD* rather than the instance parse tree, allows a great deal more interesting and meaningful processing. I know I don't stand a chance of convincing anyone on the significance of this point so I won't try. But I fully expect that *most* XML applications will ignore markup declaration processing if XML says it's possible to have meaningful processing without recognizing declarations. What worries me most is that XML will lead to a fixed language with semantics (by virtue of a fixed -- yet chaotic/anarchical -- tag set, just like "HTML"), and thus, a fixed not-very-extensible "language" instead of a metalanguage. -robin
Received on Tuesday, 17 September 1996 10:33:32 UTC