RE: [DM] white space from Michael Rys on 2003-12-05 (public-qt-comments@w3.org from December 2003)

From: Michael Rys <mrys@microsoft.com>
Date: Fri, 5 Dec 2003 11:57:27 -0800
To: "David Carlisle" <davidc@nag.co.uk>, <public-qt-comments@w3.org>
Message-ID: <EB0A327048144442AFB15FCE18DC96C70174488C@RED-MSG-31.redmond.corp.microsoft.com>
I think the indented meaning of the remark about insignificant white
space is:

Boundary whitespace (maximal character information sequence that only
consist of whitespace) may be considered insignificant by fn:doc() if
xml:space is not set to "preserve".

So this is independent of the element content whitespace notion and
inline with my interpretation of xml:space="default" handling of XML 1.0
(the data model generation being an application that takes xml:space
into account).

And I am (obviously) strongly opposed to changing the semantics for XSLT
2.0 in a non-backwards compatible way. Especially if it starts impacting
the data model. Also note that the section you comment on is not
specific to fn:doc() or XSLT 2.0. If you want to push for a change,
please propose it as part of the semantics of fn:doc().

Best regards
Michael



> -----Original Message-----
> From: public-qt-comments-request@w3.org [mailto:public-qt-comments-
> request@w3.org] On Behalf Of David Carlisle
> Sent: Friday, December 05, 2003 8:30 AM
> To: public-qt-comments@w3.org
> Subject: [DM] white space
> 
> 
> 
> 
> The doc() function in F&O (and indirectly the document() function in
> XSLT) specify that if the representation of a resource returned from
> some URI is an XML file then the input tree should be constructed as
> specified in DM, modulo some specific implementation dependent
features
> such as which uri schemes are supported.
> 
> In DM it says:
> 
>   6.7.3 Construction from an Infoset
> 
>   Applications may construct text nodes in the data model to represent
>   insignificant white space. This decision is considered outside the
scope
>   of the data model, consequently the data model makes no attempt to
>   control or identify if any or all insignificant white space is
ignored
> 
> 
> This appears to be contradictory. Unless the document has been
validated
> (and so some element is known not to have mixed content) all space is
> significant.  But this is describing building a datamodel from the
> infoset not from the PSVI, so it hasn't been schema validated at
least,
> and I'm not sure if the DM really takes note of DTD validation as
> currently written.
> 
> 
> The only occurrence of the word "significant" in the infoset document
is
> 
>     White space within start-tags (other than significant white space
in
>     attribute values) and end-tags.
> 
> which clearly is irrelevant here.
> 
> 
> In current XSLT1 applications more or less the only significant
> incompatibility between implementations (baring bugs) is msxsl's
> tendency to drop spaces. (If called from an API a more conforming
> behaviour can be specified, but notably _not_ if called via the
> xml-stylesheet PI) This means that the (in most ways excellent) msxsl
> implementation will render an xml fragment such as
> <p><b>Bold</b> <span>words</span> <i>italic</i></p>
> as
> Boldwordsitalic
> if given an "identity transform" to html as it will decide that
> inter-word spaces are insignificant. Arguably this is conformant (if
> confusing) behaviour as XSLT/XPath 1 said essentially nothing about
how
> the
> tree should be built. I believe that in version 2 of the language it
is
> clear that the wording should be clarified so that this unfortunate
loss
> of interoperabiliy (and usability) is clearly not allowed without some
> specific user-option that requests it.
> 
> 
> I fear that the wording in 6.7.3 was intended to authorise the
dropping
> of the interword spaces in my <p> example. It fails to do that as
> it refers to a term "insignificant white space" that is apparently
> undefined, however I believe that the comment should be deleted rather
> than fixed. It is an unnecessary optional clause to stop
> interoperability, systems storing documents in efficient database
> storage forms can construct the data model instance in any way they
> like, there is no need to allow systems that are parsing explict XML
> documents to have the same flexibility.
> 
> 
> there is some discussion of this on xml-dev
> 
> http://lists.xml.org/archives/xml-dev/200307/msg00148.html
> 
> (and any number of posts on xsl-list where users have fallen into this
> trap and asking where their spaces went, or why some node count that
> went 1,2,3 on msxsl goes 2,4,6 on every other processor)
> 
> 
> David
> 
>
________________________________________________________________________
> This e-mail has been scanned for all viruses by Star Internet. The
> service is powered by MessageLabs. For more information on a proactive
> anti-virus service working around the clock, around the globe, visit:
> http://www.star.net.uk
>
________________________________________________________________________
>
Received on Friday, 5 December 2003 14:57:45 UTC