[Bug 1309] New: white space in the DM

http://www.w3.org/Bugs/Public/show_bug.cgi?id=1309

           Summary: white  space in the DM
           Product: XPath / XQuery / XSLT
           Version: Last Call drafts
          Platform: PC
        OS/Version: Windows XP
            Status: NEW
          Severity: normal
          Priority: P2
         Component: Data Model
        AssignedTo: Norman.Walsh@Sun.COM
        ReportedBy: davidc@nag.co.uk
         QAContact: public-qt-comments@w3.org


Some of the following issues have been raised on earlier drafts but it
seems safest to raise them again as last call issues in bugzilla. 



6.7.3 Construction [of text nodes] from an Infoset

says

   If the resulting Text Node consists entirely of white space and the
   Text Node occurs in Element content[XML], the content of the Text Node
   is the zero-length string.  

The reference to Element Content XML production is inappropriate as
the input to this procedure is an infoset rather than a literal XML
document. The [element content whitespace] infoset property is flagged
a few lines up as being optionally used so this could say

   If the resulting Text Node consists entirely of characters with an
   [element content whitespace] property with value true, the content
   of the Text Node is the zero-length string.

This would make the document consistent however (with either wording)
this clause introduces a very large incompatibility with XPath1.

I think it would be better to drop this clause altogether, systems
requiring white space nodes to be dropped can use the PSVI mapping
or a proprietary mapping to the datamodel, neither of which have any
xpath1 compatiblity implications.

Dropping white space from declared element content from schema
validated (PSVI) input makes sense and is something that could be
tested in a conformance test. Dropping white space from the infoset
mapping if [element content whitespace] is reported isn't really
testable as non validating parsers may or may not report this
and don't need to document whether they do or they don't.

As it is it means that given
<!DOCTYPE x [
<!ELEMENT x (x*)>
]>
<x>
  <x/>
  <x/>
</x>

a simple xpath of /x/node()[2] is completely undefined: it may pick up
the the first or the second empty x node.

If this clause is kept it should be higlighted here that it is
incompatible with Xpath1's data model and the XPath (and XSLT)
Compatability appendices should also mention this.





For the reverse mapping
6.7.5 (and J7) states that all characters get mapped to infoset items
with [element content whitespace] of unknown.

The infoset has a constraint that all non-white characters have a
value of false for this property
http://www.w3.org/TR/xml-infoset/#infoitem.character
says:  ..It is always false for characters that are not white space.

So I think the mapping from the DM to the infoset should set this
property to false or to unknown depending on whether the character is
white space.

David

Received on Saturday, 7 May 2005 15:44:56 UTC