- From: Jim Melton <jim.melton@acm.org>
- Date: Wed, 09 Jun 2004 18:27:36 -0600
- To: Igor Hersht <igorh@ca.ibm.com>
- Cc: "Michael Kay" <mhk@mhk.me.uk>, ashokmalhotra@alum.mit.edu, public-qt-comments@w3.org, Stephen.Buxton@oracle.com
- Message-Id: <6.0.0.22.2.20040609180312.03466540@gmstimap.oraclecorp.com>
Gentlepeople, My apologies for entering this discussion rather late, even though it is one of my favorite topics; I've been out of the country on business and am just now catching up on email. The most pithy definition of "collation" that I can devise would read something like this: collation: A specification of the manner in which character strings are compared and, by extension, ordered. That definition says absolutely nothing about the technology used to perform the comparisons/orderings, nor about how to specify a collation in any particular context. I think those omissions are a strength of the definition. Such a definition does not preclude collations based on the Unicode Collation Algorithm (UCA), proprietary mechanisms, or even so-called "phone book" collations. I, in agreement with the XML Query WG (and, presumably, the XSL WG), would oppose any definition that might preclude some collation that our implementations might use or that our customers might demand. Igor, you have raised some interesting points, but I don't think that we are in any disagreement about the goals. Nonetheless, I think that I do disagree with your statement that the UCA "cannot be implemented correctly when you compare or match just parts of the strings (represented by the collation units)". Perhaps it's a matter of interpretation, because I believe that such comparisons can be done, but (as I think you said) the collation units for the entire set of strings must be computed in order for them to be done. You argue that this cannot be implemented in a reasonable period of time, but others may well disagree (indeed, some may already have implemented such facilities), so this is not a useful argument against such a requirement. As Mike Kay said, "Whether a real collation actually operates in this way is irrelevant, it only needs to produce the same results as if it did so". I especially disagree with your statement that "Just anoter example from the Unicode specs which theoretically cannot be implemented (for contains or any other collation function) using just collation elements". I am convinced, after inspecting UTR #10 and spending a bit of time thinking about this, that matching such as that required by fn:contains() can readily be implemented using just collation elements. It might (or might not) be claimed that doing so would be time-consuming or perhaps inefficient, but "theoretically cannot be implemented" is very hard to swallow. That seems tantamount to a claim that the UCA cannot be implemented ("for...any other collation function"), even in theory, which is patently absurd. Surely fn:compare($arg1, $arg2, $collation) is "any...collation function". Do you really mean to imply that it is theoretically impossible to implement that function? Mike also asked about what assumptions can a system make about a collation, such as transitivity, symmetry, etc. This is an area fraught with peril, except when the nature of the collation is generally known. For example, I share your belief that collations based on the UCA are transitive, symmetrical, etc., but I can easily imagine other collations that do not share all of, perhaps any of, those properties. That makes it dangerous for a system to make universal assumptions. Of course, a partial solution (which I think I could support) to this problem is to say that the results of collations that do not support those properties is implementation-defined. (Mike, with respect, I am troubled by your lengthy, almost algorithmic, definition of a collation, in part because it seems to presume something very like the UCA, but also in part because I see no need for such detailed semantics to be included in the definition. I strongly prefer my much more terse and general definition. For similar reasons, I am uncomfortable with Ashok's proposed definition; again, it goes into too much detail and I don't think we need to provide a tutorial on the possible behaviors that collations can be built to provide.) Hope this helps, Jim ======================================================================== Jim Melton --- Editor of ISO/IEC 9075-* (SQL) Phone: +1.801.942.0144 Oracle Corporation Oracle Email: jim dot melton at oracle dot com 1930 Viscounti Drive Standards email: jim dot melton at acm dot org Sandy, UT 84093-1063 Personal email: jim at melton dot name USA Fax : +1.801.942.3345 ======================================================================== = Facts are facts. However, any opinions expressed are the opinions = = only of myself and may or may not reflect the opinions of anybody = = else with whom I may or may not have discussed the issues at hand. = ========================================================================
Received on Wednesday, 9 June 2004 20:29:38 UTC