W3C home > Mailing lists > Public > public-qt-comments@w3.org > December 2003

Re: [DM] white space

From: David Carlisle <davidc@nag.co.uk>
Date: Mon, 8 Dec 2003 10:25:50 GMT
Message-Id: <200312081025.KAA04262@penguin.nag.co.uk>
To: mrys@microsoft.com
Cc: public-qt-comments@w3.org


> For the data model: the WG, otherwise the data model spec would be
> different.

Not necessarily, some things just slip through by accident, that's the
point of a public review isn't it?

Ideally Xquery would adopt some version of xsl:strip-space into its
prologue and then the xslt and Xquery commands would be specified as
passing a specified flag to the data model building which would cause
white space text nodes to be dropped. Note that this only needs to apply
to building a data model instance by parsing an XML file (the point of
the section commented on in this thread) If the data model instance is
coming from some other source (eg straight from a database or whatever,
then its white space behaviour is out of scope for this spec, and I have
no objection to that.

The text clearly can not stand as it is.  It is defined in terms of 
"insignificant white space"
but this term is not defined in any spec that I have looked at (DM, XML
rec, infoset. Although the xml spec says 

  On the other hand, "significant" white space that should be preserved
  in the delivered version is common, for example in poetry and source
  code.

This is juust an aside, and not part of any definition that can be
referenced. 

It is not acceptable to leave open the interpretation of this definition
of the implementor, especially as this thread has shown there are wide
differences in interpretation. I for example believe that inter-word
spaces in English language sentences are significant, but apparently
Michael Rhys does not.


If for some reason the working groups do want to define "insignificant
white space" and allow implementations freedom to silently drop such
spaces (sacrificing interoperability for some unspecified gain) then any
definition will break the spirit of the XML recommendation which clearly
states:

  An XML processor must always pass all characters in a document that
  are not markup through to the application. A validating XML processor
  must also inform the application which of these characters constitute
  white space appearing in element content. 

(Which I believe was chosen as the XML processing model to avoid the
problems shown up after many years of sgml experience of problems with
parsers trying to decide automatically which spaces to drop.)
As Micahel Rhys indicated you may claim that you are following the letter
of the specification if you claim that the parser is preserving the
spaces (but not showing them to anybody or anything) but they are being
dropped while building the datamodel instance. However this is clearly
just a legalistic fudge that does not help the end user, and any
browsing of xsl-list will quickly show that failure to achieve
interoperability in this area does seriously inconvenience the end
user. However if you really want to define this term I believe that the
only workable definition would be the definition alluded to in the
quotation above from the XML rec,

 white space appearing in element content. 

ie white space nodes appearing in elements _declared_ (in DTD, or now,
schema) to take element (not mixed) content. Allowing processors to
siently drop such spaces would still harm interoperability but at least
it is unlikely to produce results that are simply wrong, such as losing
inter word spaces in English.

David



________________________________________________________________________
This e-mail has been scanned for all viruses by Star Internet. The
service is powered by MessageLabs. For more information on a proactive
anti-virus service working around the clock, around the globe, visit:
http://www.star.net.uk
________________________________________________________________________
Received on Monday, 8 December 2003 05:27:44 UTC

This archive was generated by hypermail 2.3.1 : Wednesday, 7 January 2015 15:45:15 UTC