- From: David Carlisle <davidc@nag.co.uk>
- Date: Fri, 5 Dec 2003 16:30:26 GMT
- To: public-qt-comments@w3.org
The doc() function in F&O (and indirectly the document() function in
XSLT) specify that if the representation of a resource returned from
some URI is an XML file then the input tree should be constructed as
specified in DM, modulo some specific implementation dependent features
such as which uri schemes are supported.
In DM it says:
6.7.3 Construction from an Infoset
Applications may construct text nodes in the data model to represent
insignificant white space. This decision is considered outside the scope
of the data model, consequently the data model makes no attempt to
control or identify if any or all insignificant white space is ignored
This appears to be contradictory. Unless the document has been validated
(and so some element is known not to have mixed content) all space is
significant. But this is describing building a datamodel from the
infoset not from the PSVI, so it hasn't been schema validated at least,
and I'm not sure if the DM really takes note of DTD validation as
currently written.
The only occurrence of the word "significant" in the infoset document is
White space within start-tags (other than significant white space in
attribute values) and end-tags.
which clearly is irrelevant here.
In current XSLT1 applications more or less the only significant
incompatibility between implementations (baring bugs) is msxsl's
tendency to drop spaces. (If called from an API a more conforming
behaviour can be specified, but notably _not_ if called via the
xml-stylesheet PI) This means that the (in most ways excellent) msxsl
implementation will render an xml fragment such as
<p><b>Bold</b> <span>words</span> <i>italic</i></p>
as
Boldwordsitalic
if given an "identity transform" to html as it will decide that
inter-word spaces are insignificant. Arguably this is conformant (if
confusing) behaviour as XSLT/XPath 1 said essentially nothing about how the
tree should be built. I believe that in version 2 of the language it is
clear that the wording should be clarified so that this unfortunate loss
of interoperabiliy (and usability) is clearly not allowed without some
specific user-option that requests it.
I fear that the wording in 6.7.3 was intended to authorise the dropping
of the interword spaces in my <p> example. It fails to do that as
it refers to a term "insignificant white space" that is apparently
undefined, however I believe that the comment should be deleted rather
than fixed. It is an unnecessary optional clause to stop
interoperability, systems storing documents in efficient database
storage forms can construct the data model instance in any way they
like, there is no need to allow systems that are parsing explict XML
documents to have the same flexibility.
there is some discussion of this on xml-dev
http://lists.xml.org/archives/xml-dev/200307/msg00148.html
(and any number of posts on xsl-list where users have fallen into this
trap and asking where their spaces went, or why some node count that
went 1,2,3 on msxsl goes 2,4,6 on every other processor)
David
________________________________________________________________________
This e-mail has been scanned for all viruses by Star Internet. The
service is powered by MessageLabs. For more information on a proactive
anti-virus service working around the clock, around the globe, visit:
http://www.star.net.uk
________________________________________________________________________
Received on Friday, 5 December 2003 11:30:48 UTC