RE: PLease define 'collation' from Michael Kay on 2004-06-09 (public-qt-comments@w3.org from June 2004)

From: Michael Kay <mhk@mhk.me.uk>
Date: Wed, 9 Jun 2004 22:09:19 +0100
To: "'Igor Hersht'" <igorh@ca.ibm.com>
Cc: <ashokmalhotra@alum.mit.edu>, <public-qt-comments@w3.org>, <Stephen.Buxton@oracle.com>
Message-Id: <20040609211001.19D1FA107C@frink.w3.org>

> > Our collations aren't restricted to conform to the
> >Unicode Collation Algorithm. This is a much more
> >abstract definition of what a collation is.
> 
> I agree that a collation definition should not be restricted to
> the Unicode collation (there is no such restriction in my proposal).
> In my opinion the definition should also:
> 1. Include the Unicode collation.
> 2. Be implementable within a reasonable time limits.
> 3. Have reasonable performance.

I wasn't thinking at all of a definition that was implementable. I was
looking for an abstract definition of what we mean by the word "collation".
I was trying to find a definition that would  ensure some predictable
properties in terms of defining total ordering, transitivity, etc, without
placing any constraints on the implementation.

In other words I am trying to answer the question, what can a system assume
about a collation? Is it allowed to assume that if X<Y, then Y>X? Is it
allowed to assume that if X=Y, then X contains Y? Can it assume that if X
contains Y and Y contains Z, then X contains Z?

A collation that can be described as a mapping from strings to sequences of
integers clearly has some nice properties of this kind. If we think that
there are collations in ordinary use that can't be described as a mapping
from strings to sequences of integers, then we need to ask which of the
above properties still hold for such collations. If we can't assume any of
these properties then this has a major impact on the feasibility of
optimisation.

Whether a real collation actually operates in this way is irrelevant, it
only needs to produce the same results as if it did so.

Michael Kay

Received on Wednesday, 9 June 2004 17:10:01 UTC