- From: Michael Kay <mhk@mhk.me.uk>
- Date: Fri, 9 Apr 2004 19:38:14 +0100
- To: <public-qt-comments@w3.org>
In message: http://lists.w3.org/Archives/Public/public-qt-comments/2004Feb/0972.html Henry Zongaro raised the question of ignorable collation units and their effect on functions such as contains() and substring-before(). I've been doing a bit of investigation as to what Java does. Using the collation (which maps in a fairly obvious way to a Java comparator) let $coll := "http://saxon.sf.net/collation?lang=en;strength=primary" I get compare("in-scope", "inscope", $coll) = 0 so it appears this is a collation in which hyphen is "ignorable". But it turns out that Java is actually generating 8 collation units for the first string, and only 7 for the second. It is treating the strings as equal because (I think) the difference between "-" and "" is a tertiary difference, and tertiary differences are ignored when the collation strength is primary. Using the same collation, I get: contains("in-scope", "-", $coll) = true contains("in-scope", "inscope", $coll) = false substring-before("in-scope", "-") = "in" substring-after("in-scope", "-") = "scope" So as far as I can see, Java side-steps the problem in the Unicode algorithm that the comment refers to. The hyphen is not really an "ignorable" character at all, it generates a collation unit which is ignored at certain levels. Therefore, the fact that hyphen is ignored in equality testing at a certain level does not affect the results of the contains() function and its friends, which produce the expected results. Having established that Java has no problem handling ignorables here, I'm not sure what our specs need to say about the situation. I think it's a non-problem and we should avoid mentioning it. (It's interesting, though, that A eq B can be true, when contains(A,B) is false, under the same collation). Michael Kay (personal contribution)
Received on Friday, 9 April 2004 14:38:19 UTC