- From: Michael Rys <mrys@microsoft.com>
- Date: Mon, 8 Dec 2003 10:17:33 -0800
- To: "David Carlisle" <davidc@nag.co.uk>
- Cc: <public-qt-comments@w3.org>
See below. And to the person that claims I am trying to pass Microsoft's position as the WG position I can only say "hoghwash". While I am certainly representing MS inside the WG, I am trying to explain the spec and its motivation on this list (unless I indicate otherwise, eg, when I submit our own comments). Regards Michael > -----Original Message----- > From: David Carlisle [mailto:davidc@nag.co.uk] > Sent: Monday, December 08, 2003 2:26 AM > To: Michael Rys > Cc: public-qt-comments@w3.org > Subject: Re: [DM] white space > > > > For the data model: the WG, otherwise the data model spec would be > > different. > > Not necessarily, some things just slip through by accident, that's the > point of a public review isn't it? > > Ideally Xquery would adopt some version of xsl:strip-space into its > prologue and then the xslt and Xquery commands would be specified as > passing a specified flag to the data model building which would cause > white space text nodes to be dropped. Note that this only needs to apply > to building a data model instance by parsing an XML file (the point of > the section commented on in this thread) If the data model instance is > coming from some other source (eg straight from a database or whatever, > then its white space behaviour is out of scope for this spec, and I have > no objection to that. [Michael Rys] I don't think adding such a flag to the prolog is a good idea. I think you probably would like the fn:doc() function to get an additional argument to indicate whitespace handling. In that way, we would have the semantics and the flag closer. However even then, you have the problem of fn:doc() implementations just referring to a cached document that already has either preserved or stripped the whitespace when being loaded. What should the flag do in that case? I agree with you that the process of generating the data model should give the user the choice (and I try to get that into some of our products, currently with little success due to schedule issues), but given that this really affects a stage that is often outside of the data model specification's realm, I think all we can do is call out this dependency and let the users demand support for either. > The text clearly can not stand as it is. It is defined in terms of > "insignificant white space" > but this term is not defined in any spec that I have looked at (DM, XML > rec, infoset. Although the xml spec says > > On the other hand, "significant" white space that should be preserved > in the delivered version is common, for example in poetry and source > code. > > This is juust an aside, and not part of any definition that can be > referenced. [Michael Rys] I agree that we need to base the spec on defined terms. > It is not acceptable to leave open the interpretation of this definition > of the implementor, especially as this thread has shown there are wide > differences in interpretation. I for example believe that inter-word > spaces in English language sentences are significant, but apparently > Michael Rhys does not. [Michael Rys] I find inter-word spaces significant (as I do the absence of the h in my last name :-)), if there is either explicit indication that it should be preserved or it occurs with other words inside the same text node. They are not significant if they occur between markup tags without anything else. > If for some reason the working groups do want to define "insignificant > white space" and allow implementations freedom to silently drop such > spaces (sacrificing interoperability for some unspecified gain) then any > definition will break the spirit of the XML recommendation which clearly > states: > > An XML processor must always pass all characters in a document that > are not markup through to the application. A validating XML processor > must also inform the application which of these characters constitute > white space appearing in element content. [Michael Rys] This goes into the definition of an XML processor. In our interpretation the XML processor is the process that generated the information set. The data model generation is an application... > (Which I believe was chosen as the XML processing model to avoid the > problems shown up after many years of sgml experience of problems with > parsers trying to decide automatically which spaces to drop.) > As Micahel Rhys indicated you may claim that you are following the letter > of the specification if you claim that the parser is preserving the > spaces (but not showing them to anybody or anything) but they are being > dropped while building the datamodel instance. However this is clearly > just a legalistic fudge that does not help the end user, and any > browsing of xsl-list will quickly show that failure to achieve > interoperability in this area does seriously inconvenience the end > user. However if you really want to define this term I believe that the > only workable definition would be the definition alluded to in the > quotation above from the XML rec, > > white space appearing in element content. > > ie white space nodes appearing in elements _declared_ (in DTD, or now, > schema) to take element (not mixed) content. Allowing processors to > siently drop such spaces would still harm interoperability but at least > it is unlikely to produce results that are simply wrong, such as losing > inter word spaces in English. > > David > > > > ________________________________________________________________________ > This e-mail has been scanned for all viruses by Star Internet. The > service is powered by MessageLabs. For more information on a proactive > anti-virus service working around the clock, around the globe, visit: > http://www.star.net.uk > ________________________________________________________________________
Received on Monday, 8 December 2003 13:17:37 UTC