[DM] white space handling in October 04 draft [6.7 Text Nodes]

This is a follow up on the white space comments made on earlier drafts

http://lists.w3.org/Archives/Public/public-qt-comments/2004Jul/0074.html
http://lists.w3.org/Archives/Public/public-qt-comments/2003Dec/0085.html

In the new October 2004 draft

6.7.3 Construction from an Infoset

states that the infoset property  [element content white space] is
optionally used, but does not explictly state where or how it is used.

It does contain the phrase
 "and the Text Node occurs in Element contentXM"
but that is rather different and a reference to the XML REC not the infoset.

I think that this phrase should be changed to say

 and the text node consists solely of characters for which the [element
 content white space] property (if known) is true.

The above change (which is probably editorial) would make the definition
of the mapping well defined, however it would have a big effect on
compatibility of Xpath 1, which should be mentioned, either here or in
the Xpath Rec compatibility appendix.

In Xpath 1 it is clear that given
<x>
 <a/>
 <a/>
</x>

that x has 5 child nodes, three white space text and two element.

In Xpath2 that might still be the case, but even with Backward
Compatibilty mode enabled, x will have just the two element nodes and no
text node children if there is a DOCTYPE that refers to a dtd that
declares x to have element content and a dtd-validating parser was used
(so [element content white space] is reported) and the Xpath system
takes the option of using [element content white space].

It is clear that the large number of conditional clauses in the above
implies that interoperability is likely to be rather harder to achieve
in Xpath2 than Xpath 1. If the WG can't agree to fix this definition so
that all processors act the same way, the variablity should at least be
clearly documented.

This isn't a minor edge case, it affects the majority of Xpath
expressions: anything doing any kind of numbering or filtering based on
position() or last() is affected by this.

David

________________________________________________________________________
This e-mail has been scanned for all viruses by Star. The
service is powered by MessageLabs. For more information on a proactive
anti-virus service working around the clock, around the globe, visit:
http://www.star.net.uk
________________________________________________________________________

Received on Monday, 1 November 2004 15:15:57 UTC