- From: Nils Klarlund <klarlund@research.att.com>
- Date: 22 Apr 2002 15:46:14 -0400
- To: www-xml-query-comments@w3.org
Regarding the unified data model -------------------------------- I have just read the newest documents describing the XML data model (http://www.w3.org/TR/2001/WD-query-datamodel-20011220/) and XSLT 2.0. (http://www.w3.org/TR/2001/WD-xslt20-20011220/). This is great work. It is a pleasure to see how the models are being simplified and vocabularies unified. Let me offer one suggestion, which I will try to motivate from my own experiences using and explaining XML. As a lot of people, I have been annoyed with the strange distinction between so-called XML documents and the bizarrely named "XML external parsed entities". The former denote trees, the latter denote sequences of trees, variously called tree fragments, forests, or hedges. There are other important differences between the two that I can never remember. I have not liked or understood the "document" node that exists in the modeling of XML documents---for most purposes, this node seems to be rather superfluous. On the other hand, there is no model for XML external parsed entities, where such a document node would make perfect sense: as a way of assembling the sequence of trees into a tree. Even for the simplest of data logging purposes, one is struggling with XML documents: there is no fast way of appending to a file that is an XML document (because, intuitively, the new log data goes inside the top-level element). Instead, one has to resort to using an external entity reference in a dummy file, which is a XML document, to a file that contains the data in the form of an XML external parsed entity. That XML external parsed entity is of course a file that can be appended to in constant time. I noticed the wording in XSLT 2.0: "The normal restrictions on the children of the document node are relaxed for the result tree and for temporary trees constructed during the evaluation of the stylesheet. The document node of such a tree may have any sequence of nodes as children that would be possible for an element node. In particular, it may have text node children, and any number of element node children. When written out using the XML output method (see [18 Serialization]), it is possible that a result tree will not be a well-formed XML document; however, it will always be a well-formed external general parsed entity." I think this reasoning leads to an (obvious?) solution: 1) Keep the meaning of the "document node" for compatibility reasons. 2) Introduce the concept of "file node". The file node is the "meta-node" that allows a sequence of nodes to be written out to a file. In particular, the document node is a file node. And, the "document node" under the relaxation above is a file node. A file node denotes a tree that can be written out as an XML file: either as a XML document or as an well-formed external general parsed entity. An XSLT processor should be able to work directly on a external general parsed entity. It makes little sense in my opinion that XSLT should be restricted to XML files whose file node happen to have only one child that is an element node. 3) We now have a terminology that is compelling I would hope. It allows for a consistent, visualizable, and serializable use of the "extra node" that the document node was. It will also help promote the stepchild of "external general parsed entity" so that it naturally takes on the role of representing intermediate results. Eventually, in a couple of years, this concept will be called an XML file; if it has only one element, it can be called a document, and various byzantine entity concepts will long have been forgotten. I did not check these thoughts out with my XML Query colleague (Mary), so maybe they will turn out to be out of whack? /Nils -- This message was composed using ShortTalk, the editing language that makes text composition a breeze. Occasional strange words are speech recognition errors. http://www.research.att.com/~klarlund/ShortTalk
Received on Monday, 22 April 2002 15:46:28 UTC