- From: David Carlisle <davidc@nag.co.uk>
- Date: Fri, 5 Dec 2003 16:30:26 GMT
- To: public-qt-comments@w3.org
The doc() function in F&O (and indirectly the document() function in XSLT) specify that if the representation of a resource returned from some URI is an XML file then the input tree should be constructed as specified in DM, modulo some specific implementation dependent features such as which uri schemes are supported. In DM it says: 6.7.3 Construction from an Infoset Applications may construct text nodes in the data model to represent insignificant white space. This decision is considered outside the scope of the data model, consequently the data model makes no attempt to control or identify if any or all insignificant white space is ignored This appears to be contradictory. Unless the document has been validated (and so some element is known not to have mixed content) all space is significant. But this is describing building a datamodel from the infoset not from the PSVI, so it hasn't been schema validated at least, and I'm not sure if the DM really takes note of DTD validation as currently written. The only occurrence of the word "significant" in the infoset document is White space within start-tags (other than significant white space in attribute values) and end-tags. which clearly is irrelevant here. In current XSLT1 applications more or less the only significant incompatibility between implementations (baring bugs) is msxsl's tendency to drop spaces. (If called from an API a more conforming behaviour can be specified, but notably _not_ if called via the xml-stylesheet PI) This means that the (in most ways excellent) msxsl implementation will render an xml fragment such as <p><b>Bold</b> <span>words</span> <i>italic</i></p> as Boldwordsitalic if given an "identity transform" to html as it will decide that inter-word spaces are insignificant. Arguably this is conformant (if confusing) behaviour as XSLT/XPath 1 said essentially nothing about how the tree should be built. I believe that in version 2 of the language it is clear that the wording should be clarified so that this unfortunate loss of interoperabiliy (and usability) is clearly not allowed without some specific user-option that requests it. I fear that the wording in 6.7.3 was intended to authorise the dropping of the interword spaces in my <p> example. It fails to do that as it refers to a term "insignificant white space" that is apparently undefined, however I believe that the comment should be deleted rather than fixed. It is an unnecessary optional clause to stop interoperability, systems storing documents in efficient database storage forms can construct the data model instance in any way they like, there is no need to allow systems that are parsing explict XML documents to have the same flexibility. there is some discussion of this on xml-dev http://lists.xml.org/archives/xml-dev/200307/msg00148.html (and any number of posts on xsl-list where users have fallen into this trap and asking where their spaces went, or why some node count that went 1,2,3 on msxsl goes 2,4,6 on every other processor) David ________________________________________________________________________ This e-mail has been scanned for all viruses by Star Internet. The service is powered by MessageLabs. For more information on a proactive anti-virus service working around the clock, around the globe, visit: http://www.star.net.uk ________________________________________________________________________
Received on Friday, 5 December 2003 11:30:48 UTC