RE: "Roots" of confusion introduced at W3C (long) from DuCharme, Robert on 2000-09-22 (xsl-editors@w3.org from July to September 2000)

From: DuCharme, Robert <Robert.DuCharme@moodys.com>
Date: Fri, 22 Sep 2000 10:13:47 -0400
To: "'xsl-list@mulberrytech.com'" <xsl-list@mulberrytech.com>
Cc: xsl-editors@w3.org
Message-ID: <01BA10F0CD20D3119B2400805FD40F9F0249DD7B@MDYNYCMSX1>
>Given the nature of our present discussion, I suggest the claim in the
first 
>sentence of the Recommendation that XML "is completely described in this 
>document" can be seen to be a dubious claim, at best.

It was true enough in February of 1998, when it was published. The W3C has
acknowledged the lack of coordination between the various groups who have
since developed add-on technologies. (See below for more on that.)

>If, as you suggest, the meaning of "root" depends on the tree
representation 
>of that document why is that not explained in the XML 1.0 Recommendation? 
>...
>Further, if you are correct that the concept of "root" depends on the tree 
>representation, why do the editors of the Recommendation use the term
"root" 
>in two distinct usages in Section 2 and Section 2.1 of the Recommendation 
>without adequate explanation?

From 2: "Each XML document has both a logical and a physical structure.
Physically, the document is composed of units called entities. An entity may
refer to other entities to cause their inclusion in the document. A document
begins in a 'root' or 'document entity'."

That first sentence is the explanation about the two structures. Yes, it's
terse, I can only respond to complaints that it's too terse with a crass
commercial plug (see http://www.snee.com/bob/xmlann/). The second and third
sentence go on about the physical structure. I mentioned in my last post
that the physical structure doesn't care about the logical structure except
to specify how an entity qualifies as being well-formed; that's what 2.1 is
about. Item 2 in the second list ("There is exactly one element...") is
talking about logical structure, because it's talking about elements. 

>The Recommendation goes on to describe the document entity as the root of
the 
>"entity tree". Is there a physical "entity tree"? I think not. It is a 
>logical relationship. The "document entity" which Section 2 claims is a 
>"physical structure" is also, so it seems, the root of a logical "entity 
>tree". Or would you wish to claim that a "physical" entity tree exists?

An XML parser must read in a document entity and resolves external entity
references by locating and reading in referenced external entities. If
document entity A refers to external entities B and C, and B refers to D and
E, and C refers to F and G, the parser must open each of those files and
read them off the disk into its memory. That's what they mean by physical.
If you sketch out the relationship between these entities, that's the
physical tree they're talking about. 

>But the Recommendation later claims the document entity "has no name". In
my 
>file system the document entity, as a "physical structure" does have a name
- 
>"sample.xml", for example. So in what sense does Section 4.8 refer to the 
>document entity having no name? 

If I declare an entity like this,

  <!ENTITY foo SYSTEM "bar.xml">

the entity's name is foo. The filename bar.xml is not the entity's name, but
its system identifier. A document entity may have a filename (the more
generic term "system identifier" is used because it all still works on
operating systems that don't use the concept of a "file") but as an entity,
it has no entity name. (And remember, a document entity doesn't have to even
have a filename or other system identifier--it might be handed to the XML
parser in memory from a database manager or from a perl script via a pipe.)

>Section 4.8 is referring to the document 
>entity in another usage - when it is being **logically** combined with any 
>other entities

I don't see that in 4.8. It says that it's "a starting-point for an XML
processor"--it's the A in the A, B, C, D, E, F, G that I described above are
physically read in and combined in memory.

>Can you see how the Recommendation blurs and confuses the supposed 
>distinction between "logical structure" and "physical structure"?

Nope!

>I appreciate that the DOM 1 Recommendation came later. It refers back to
XML 
>1.0 in the definition of "root node"

Where does it do that? I couldn't find it in
http://www.w3.org/TR/REC-DOM-Level-1/level-one-core.html.

>I admit I could have missed this but does any W3C document adequately
explain 
>those differences or the practical consequences of them? Should some W3C 
>document not actually do so?

A single document that explains the difference approaches between five
different specs would have to be a five-by-five matrix that has a new column
and row added every time a new related spec was written, and that would be
impractical. What would have been better would be if the XML spec had laid
out, in addition to an explanation of a document's logical and physical
structure, the details of its "information" structure--exactly what
information a processing program could expect of it. 

This failure (and the XML Working Group can't be blamed too much for
it--they set out to design a stripped-down version of SGML that could be
shipped over the Web more easily than full SGML, and had no idea of the uses
that people would put XML to) meant that the Working Groups for additional
XML technologies had to make up and assume certain things, and this let to
subtle and not-so-subtle conflicts between those additional specs. The
Infoset spec is an attempt to make up for this. These conflicts are also a
key reason that so many important specs have been held up in the Candidate
Recommendation stage lately.

Bob DuCharme          www.snee.com/bob           <bob@  
snee.com>  "The elements be kind to thee, and make thy
spirits all of comfort!" Anthony and Cleopatra, III ii
Received on Friday, 22 September 2000 10:34:37 UTC