issue 41 or permissiveness is good, give it a nice name

Regarding the unified data model
--------------------------------

I have just read the newest documents describing the XML data model
(http://www.w3.org/TR/2001/WD-query-datamodel-20011220/) and XSLT 2.0.
(http://www.w3.org/TR/2001/WD-xslt20-20011220/).

This is great work.  It is a pleasure to see how the models are being
simplified and vocabularies unified.

Let me offer one suggestion, which I will try to motivate from my own
experiences using and explaining XML.

As a lot of people, I have been annoyed with the strange distinction
between so-called XML documents and the bizarrely named "XML external
parsed entities".  The former denote trees, the latter denote
sequences of trees, variously called tree fragments, forests, or
hedges.  There are other important differences between the two that I
can never remember.

I have not liked or understood the "document" node that exists in the
modeling of XML documents---for most purposes, this node seems to be
rather superfluous.  On the other hand, there is no model for XML
external parsed entities, where such a document node would make
perfect sense: as a way of assembling the sequence of trees into a
tree.

Even for the simplest of data logging purposes, one is struggling with
XML documents: there is no fast way of appending to a file that is an
XML document (because, intuitively, the new log data goes inside the
top-level element).  Instead, one has to resort to using an external
entity reference in a dummy file, which is a XML document, to a file
that contains the data in the form of an XML external parsed entity.
That XML external parsed entity is of course a file that can be
appended to in constant time.

I noticed the wording in XSLT 2.0:

  "The normal restrictions on the children of the document node are
   relaxed for the result tree and for temporary trees constructed
   during the evaluation of the stylesheet. The document node of such
   a tree may have any sequence of nodes as children that would be
   possible for an element node. In particular, it may have text node
   children, and any number of element node children. When written out
   using the XML output method (see [18 Serialization]), it is
   possible that a result tree will not be a well-formed XML document;
   however, it will always be a well-formed external general parsed
   entity."

I think this reasoning leads to an (obvious?)  solution:

1) Keep the meaning of the "document node" for compatibility reasons.

2) Introduce the concept of "file node".  The file node is the
   "meta-node" that allows a sequence of nodes to be written out to a
   file.  In particular, the document node is a file node.  And, the
   "document node" under the relaxation above is a file node.  A file
   node denotes a tree that can be written out as an XML file: either
   as a XML document or as an well-formed external general parsed
   entity. An XSLT processor should be able to work directly on a
   external general parsed entity.  It makes little sense in my
   opinion that XSLT should be restricted to XML files whose file node
   happen to have only one child that is an element node.

3) We now have a terminology that is compelling I would hope.  It
   allows for a consistent, visualizable, and serializable use of the
   "extra node" that the document node was.  It will also help promote
   the stepchild of "external general parsed entity" so that it
   naturally takes on the role of representing intermediate results.
   Eventually, in a couple of years, this concept will be called an
   XML file; if it has only one element, it can be called a document,
   and various byzantine entity concepts will long have been
   forgotten.

I did not check these thoughts out with my XML Query colleague (Mary),
so maybe they will turn out to be out of whack?

/Nils

-- 
This message was composed using ShortTalk, the editing language that
makes text composition a breeze.  Occasional strange words are speech
recognition errors.   http://www.research.att.com/~klarlund/ShortTalk

Received on Monday, 22 April 2002 15:46:28 UTC