- From: Kay, Michael <Michael.Kay@softwareag.com>
- Date: Wed, 11 Dec 2002 12:01:21 +0100
- To: xquery@attbi.com, public-qt-comments@w3.org
- Cc: mrys@microsoft.com
Personal replies to some of your points... > > Mostly editorial comments on the F&O Nov 15 draft (these also > still apply to the internal Dec 10 draft; section numbers > refer to the Dec 10 draft for your convenience). > > > - 6.3.1: The definition of compare() explains what happens > when one string differs in length from the other; but this > should be up to the collation. I've made this point in the past, and I agree with it. I think we have now established that functions like contains() and starts-with() do need a collation that has this property (described in the last NOTE in section 6.3), but functions that purely compare for equality and ordering do not. > > - 6.4.6, 6.4.7, 6.4.14: Surrogate pairs are irrelevant. > You've already defined things in terms of code points -- so > the underlying bytes (and therefore, surrogate pairs) never > come into play. Technically, you are correct that this note is redundant. However, since so many other programming languages that claim to have Unicode support actually treat a char as a 16-bit code unit rather than a Unicode character, I think it's important that we make this point. Some XSLT 1.0 implementations are non-compliant in this area and it's very useful to have a definitive statement in the spec that proves they are wrong. > > - 9.2.1, 10.2.1, 12.1.1: should all compare according to the > context collation 9.2.1: QNames should NOT be compared using a collation, they should be compared using Unicode code points, as described in the XML 1.0 (or perhaps XML 1.1) specification. 10.2.1 There is still some debate about exactly how anyURIs should be compared, for example how escapes are handled. We're monitoring the discussion on this in the W3C TAG. However, URIs are not natural language text and it certainly doesn't make sense to use the same algorithm as when comparing strings. 12.1.1 NOTATION is an XML concept (and a pretty obscure one at that) and we should follow the XML rules for comparison, which are based on code point comparison. > > - 6.3, etc.: As Jeni Tennison already brought up [1], URIs as > collation names are unusual (and not even followed by the > draft itself). Although the idea has merit for WS-I, almost > every collation implementation I can find uses RFC 1766 > (locale names like en-US and fr-FR). Perhaps some > implementations will invent a URI syntax for their > collations, but I expect most Java and .NET implementations > will rely on java.text.RuleBasedCollation and > System.Globalization.CultureInfo, both of which are based on > RFC 1766. If you're going to insist on URIs, then at least > make the draft examples consistent with that. We've been through a few rounds on this and no-one has come up with a satisfactory alternative. Locale names do not identify collations, they only identify communities that may have preferences for a particular collation. Within a locale such as en-GB, you will find that lexicographers, geographers, and compilers of telephone directories use completely different collations. So a locale name can only be a hint. I think that all the examples do use valid anyURI values (or at least, strings that can be cast to anyURI). The big question in this area is issue 44, which asks about the meaning of relative URIs and suggests that we should require the anyURI to be absolute. For use in XSLT, it would be much more consistent with existing practice to use a QName, but it would be difficult to define a meaning for this outside the context of a stylesheet. I think the biggest problem we face in this area is how to achieve a level of interoperability. I hope that vendors will provide mechanisms that allow the URI used in a query to be defined by the user and mapped to some collation offered by the implementation - see the way saxon:collation works in Saxon 7.x for an example of how this might be done. > > - Speaking of Jeni's prior feedback, I'd like to echo the > request for title-case(). My Personal View Is That Title Case Is Used Only In North America, and we are trying to restrict ourselves to functions that have global appeal. But in the end, deciding whether to include or exclude particular functions is a matter of judgement. Michael Kay
Received on Wednesday, 11 December 2002 06:01:28 UTC