- From: Jeni Tennison <jeni@jenitennison.com>
- Date: Fri, 4 Jan 2002 21:33:11 +0000
- To: www-xml-query-comments@w3.org
- CC: David Carlisle <davidc@nag.co.uk>
David C. wrote: > 4.2.2 xf:normalisedString > Is there any use case for this? It seems to be rather a bizarre > thing. The normalisation could be done by the user using translate() > if desired. I believe that the xf:normalizedString constructor is there because the xs:normalizedString data type exists, which is in turn because XML makes a distinction between replacing whitespace (which is done for attributes with a type of CDATA) and collapsing whitespace (which is done for attributes with other types). In other words, if you have a document that adheres to a DTD, and the type of the bar attribute is CDATA, then the type of that attribute in XPath 2.0 should, I think, be a xs:normalizedString. I think that you therefore need a xs:normalizedString constructor to create the value with which you're comparing it, so that if you have: <foo bar="a b c" /> then you can do something along the lines of: @bar eq normalizedString('a b c') and get the answer true. I admit, though, that it isn't clear to me why certain of the built-in derived data types from XML Schema get their own constructors while others (e.g. xs:positiveInteger) don't. > The restriction on not having #xD in the argument will be almost > impossible to maintain in non XML uses of Xpath. XML normalises all > line ends to #xA but in a non XML setting line ends may well be #xD > or #xD#xA pairs, in which case normalising just #xA and declaring > #xD an error will mean that an Xquery breaks just by moving the text > file containing it from one place to another (unless every host > language for xpath does a similar line end normalisation) I agree that the definition given within the F&O WD is off the mark. Partly, I think, this is because the definition of xs:normalizedString in XML Schema is slightly strange, but partly it's to do with how white space is handled. The only difference between an xs:string and an xs:normalizedString in XML Schema is the whiteSpace facet, which has a value of "preserve" for xs:string and "replace" for xs:normalizedString. According to the definition of the whiteSpace facet in XML Schema (http://www.w3.org/TR/xmlschema-2/#rf-whiteSpace), replacing whitespace involves replacing all whitespace characters (tab, line feed, and carriage return) with spaces. This differs markedly from deleting all newline characters, which is what is described for xf:normalizedString(). The XML Schema Datatypes Recommendation further says that the lexical space of xs:normalizedString cannot contain the carriage return or tab characters. However, this is guaranteed by the fact that normalizedString values have white space replaced -- given the value of an attribute or element, XML Schema will first replace all the whitespace characters with a space character, and then check to see whether the result is a valid normalizedString (with no carriage returns or tab characters in), which logically it has to be anyway. Therefore the extra assertion that normalizedStrings must not contain carriage returns or tab characters is superfluous. In short, the xf:normalizedString() constructor should not limit what characters are allowed in the argument, and should permit both carriage returns and tab characters. To create the normalizedString, it should replace all whitespace characters in the argument string to space characters. --- The handling of whitespace in the constructors in XPath is now handled properly (in my opinion) for numeric values, where whitespace is collapsed (leading and trailing stripped, sequences of whitespace replaced by a single space) prior to the value being assessed to see if it fulfils the lexical requirements of the data type. However, it is treated incorrectly for most other values. Aside from xs:string and xs:normalizedString, all data types in XML Schema have a 'collapse' value for their whiteSpace facet. As with numbers, whitespace collapsing should occur prior to the format of the value being assessed. The reason this is important is that the following is valid: <date xsi:type="xs:date"> 2002-01-04 </date> since the leading and trailing whitespace is stripped prior to checking. It will be incredibly confusing if: cast as xs:date(date) raises an error because of the whitespace in the date element, despite the fact that it validates fine using a schema validator. This applies to all the constructors in the F&O document aside from xf:string() (which should not undergo any changes to its whitespace) and xf:normalizedString() (as above). Cheers, Jeni --- Jeni Tennison http://www.jenitennison.com/
Received on Friday, 4 January 2002 16:33:14 UTC