[Bug 2729] Whitespace text nodes

http://www.w3.org/Bugs/Public/show_bug.cgi?id=2729

           Summary: Whitespace text  nodes
           Product: XPath / XQuery / XSLT
           Version: Candidate Recommendation
          Platform: PC
        OS/Version: Windows XP
            Status: NEW
          Severity: normal
          Priority: P2
         Component: Data Model
        AssignedTo: Norman.Walsh@Sun.COM
        ReportedBy: davidc@nag.co.uk
         QAContact: public-qt-comments@w3.org


This is essentially a re-raising of bug #1309 which was explicitly
deferred for comment on the CR drafts.

I agree with the requirement to strip white space text nodes in trees
built from schema-validated input. This report just concerns the
default mapping from a non schema-validated infoset.

The requirement to strip white space text nodes from elements declared
in a DTD introduces a large incompatibility between XPath 1 and XPath 2.
This incompatibility is highlighted in the XSLT draft (J.1.1) but not in the
XPath draft. If no changes are made to the specification to remove the
incompatibility then similar wording to XSLT J.1.1 should be added to
XPath I.1, as otherwise the small list of edge cases in appendix I.1
gives a rather over-optimistic view of the compatibility between the
two versions.


However, perhaps even more important than the compatibility between
XPath 1 and XPath2, is compatibility between XPath2 (and XQuery)
systems. The current requirement makes such compatibility rather hard to
achieve.

Typically a system will document which XML parser it uses, or give the
user a choice of which to use, or give a choice of whether to use the
parser in non-validating or validating mode.

If a validating parser is used, the [element content whitespace]
property will be reported, so in this case, all XPath2 (and XQuery)
systems will act in the same way (although in a way incompatible with
XPath1, this would be something I could "live with" (in W3C working
group consensus-speak).

However traditionally the most common type of parser used with XSLT
(in particular) has been a non-validating-parser-which-reads-a-dtd
(as the structure of the XSLT language means that this type of parser
is more or less required to read the XSLT file, and typically the same
parser is used on input documents). For this kind of parser there is,
as far as I can tell, no specification at all, which suggests whether
they should, or should not, report the [element content whitespace]
property on elements for which they have read a DTD declaration.
So typically a user will have no way of knowing whether or not white
space will be stripped and no way of changing the behaviour if it is
unwanted. Incompatibility with XPath1 is something that will hopefully
become less important over time, but incompatibility between
different XPath2/XQuery systems is something that should be avoided if
at all possible.

I offer 3 options

A: Do not change the specification.
   In this case, the XPath compatibility appendix should document the
   incompatibility.

B. Change the requirement to strip white space nodes so that it only
   applies to infosets constructed by a _validating_ XML parser. (DTD
   validated, so that if you validate with a DTD, the whitespace
   behaviour matches that of schema validation).

C. Remove the requirement to strip white space when building from an
   Infoset (keeping it in the case of building from a PSVI)



The status quo (A) has the largest incompatibility with Xpath 1 and
introduces similarly large incompatibilities between Xquery and XPath2
systems running on different XML parsers.

Taking either option (B) or (C) would cause all XPath2 and XQuery systems
to work the same way.

Option (C) is the most compatible with XPath1, and the one that I
personally prefer, but perhaps option (B) would be a useful compromise
position that should be considered.

David

Received on Thursday, 19 January 2006 14:04:27 UTC