RE: "Roots" of confusion introduced at W3C

>Thus, for this foundational concept of the "root" of an XML document we
find 
>multiple terms being, apparently, used for the same thing and certain terms

>being used for more than one thing.

An XML document can be represented as various kinds of trees, which serve
different purposes, and any tree has a root that serves an important role in
the use of that tree. There is no single concept of an XML document "root"
that serves all purposes; the meaning of "root" depends on the type of tree
representation of the XML document being discussed. 

When switching back and forth between XML-related specs, the difference in
the types of trees being discussed can be confusing. I don't understand them
all, but I do know the XML 1.0 spec pretty well, and it's much more
internally consistent than you make it out to be.

The XML 1.0 Rec says that documents have a physical structure and a logical
structure. The document entity is the root of the physical structure. It's
the entity (on most operating systems, a file) that the parser reads in
first, looking for references to additional external entities to read in.
The root element is the root of the logical structure; it's the element that
contains all the other elements--the document element. The logical structure
doesn't care about the physical structure, and the physical structure only
cares about logical structure if each component of the physical structure
(each entity) wants to qualify as a well-formed entity. 

>XML 1.0 - "document entity" (Section 4.8). The terms "root node" and 
>"document root" do not occur in the XML 1.0 Recommendation.

The DOM came after the XML spec, so the term "node" doesn't appear in the
Rec except for a reference in Appendix E to a classic computer science
work's description of finite state algorithms. The XML Rec never set out to
define things in terms of nodes. Representations of XML documents that serve
certain purposes, like XPath and the DOM, later used the concept of a tree
of nodes to describe their representations.

>In addition XML 1.0 confuses the issue by using the term "document entity" 
>to, apparently, refer to both the root of the tree (Section 4.8) and also
the 
>whole serialised document.

The XML 1.0 Rec never mentions serialization either. Section 4.8 clearly
states that the document entity is the root of the *entity* tree (i.e. the
physical structure). Nowhere does the Rec imply that the document entity is
the whole document; a document entity can easily have references to other
entities that act as components of the document without being part of the
document entity.

>XML 1.0 further confuses the issue by using the term "root" (with no 
>qualifier) to refer to the "document element", a child of the "document 
>entity".

The XML 1.0 spec *never* refers to the document element as a child of the
document entity. This confuses the physical and logical structure of an XML
document. (In XSLT, a document element node is a child of the source tree
node, but this is unrelated. Entities in general are meaningless to XSLT
because the XML parser that passes an input document to an XSLT processor
resolves all entities as it builds the source tree that XSLT actually works
on.)

Outside of the XML Rec, the XPath Rec says that "XPath models an XML
document as a tree of nodes." This is the model that XSLT uses, and while
the DOM also talks in terms of trees of nodes, a DOM tree is different. 

I'm not claiming that it's all very well-organized. Otherwise, there
wouldn't have been a need for the Infoset document, and Paul Prescod's talk
of groves wouldn't sound so useful. There is plenty of potential for
confusion, but if you remember that different tree representations of a
document (each with their own root) serve different purposes, it's a big
help in keeping better track of what's what.

Bob DuCharme          www.snee.com/bob           <bob@  
snee.com>  "The elements be kind to thee, and make thy
spirits all of comfort!" Anthony and Cleopatra, III ii

Received on Wednesday, 20 September 2000 16:18:14 UTC