- From: Michael Kay <mhk@mhk.me.uk>
- Date: Wed, 9 Jun 2004 22:09:19 +0100
- To: "'Igor Hersht'" <igorh@ca.ibm.com>
- Cc: <ashokmalhotra@alum.mit.edu>, <public-qt-comments@w3.org>, <Stephen.Buxton@oracle.com>

> > Our collations aren't restricted to conform to the > >Unicode Collation Algorithm. This is a much more > >abstract definition of what a collation is. > > I agree that a collation definition should not be restricted to > the Unicode collation (there is no such restriction in my proposal). > In my opinion the definition should also: > 1. Include the Unicode collation. > 2. Be implementable within a reasonable time limits. > 3. Have reasonable performance. I wasn't thinking at all of a definition that was implementable. I was looking for an abstract definition of what we mean by the word "collation". I was trying to find a definition that would ensure some predictable properties in terms of defining total ordering, transitivity, etc, without placing any constraints on the implementation. In other words I am trying to answer the question, what can a system assume about a collation? Is it allowed to assume that if X<Y, then Y>X? Is it allowed to assume that if X=Y, then X contains Y? Can it assume that if X contains Y and Y contains Z, then X contains Z? A collation that can be described as a mapping from strings to sequences of integers clearly has some nice properties of this kind. If we think that there are collations in ordinary use that can't be described as a mapping from strings to sequences of integers, then we need to ask which of the above properties still hold for such collations. If we can't assume any of these properties then this has a major impact on the feasibility of optimisation. Whether a real collation actually operates in this way is irrelevant, it only needs to produce the same results as if it did so. Michael Kay

Received on Wednesday, 9 June 2004 17:10:01 UTC