RE: Created tracking/issues page for draft-newman-i18n-comparator from Michael Kay on 2004-08-25 (public-ietf-collation@w3.org from August 2004)

From: Michael Kay <mhk@mhk.me.uk>
Date: Wed, 25 Aug 2004 09:29:19 +0100
To: "'Jim Melton'" <jim.melton@acm.org>, <public-ietf-collation@w3.org>
Message-ID: <E1BztAH-0003RK-Vp@frink.w3.org>
Comments on Jim's comments:
> 
> On 2004-08-24, Mike Kay said the following, on which I would like to 
> comment; my comments are preceded by "Jim:":
> 
> Some comments on the draft:
> 
> (a) I think we should be defining a function on character 
> strings, not on
> octet strings. ..
> 
> Jim: I mostly agree with Mike on this one.  The "mostly" involves my 
> understanding (or not) of Mike's use of "Unicode codepoints". 
>  It is my 
> understanding of Unicode that the phrase explicitly excludes 
> surrogate 
> values...  WIth that understanding, then 
> Mike and I agree.

Yes, we agree.
> 
> (b) Because the protocol is out of scope, discussions of how 
> to search for a
> collation using wildcards are also out of scope.
> 
> Jim: I don't understand this point.  

I'm basically trying to map the idea to the way XPath/XQuery use collations.
I find it hard to see us supporting a collation value of
"http://www.*.com/french" - it's just not the way URIs work. One might
instead want to have a much more sophisticated mechanism in which collations
have properties and one can search for them by their properties. My basic
point was that I think the query language used to access a repository of
collations is out of scope for this spec.
> 
> (c) Typically a collation is the combination of a basic 
> algorithm (such as
> UCA) and a set of parameters (such as language, collation 
> strength). 
> 
> Jim: I understand Mike's point, but I prefer that a single "name" 
> (actually, a URI, at least in the XQuery context) identify a single 
> collation.  If the URI has some sort of internal structure, 
> which URLs 
> certainly may, then I think both of our requirements might be 
> solved.  What 
> I specifically do not want is to have to supply anything more 
> complicated 
> than a single character string to uniquely identify the exact (most 
> specific) collation to be used for a given operation.

I agree with Jim that one needs to be able to construct a single URI that
encapsulates all the parameters. However, if an algorithm has four
parameters with an average of three values each, that gives you 81 different
collation URIs, and the registry might get rather unwieldy if each one has
its own entry. If one of the parameters is an ISO language code, it moves
from being unwieldy to being infeasible.
> 
> (f) The definition of the term "collation" should include 
> some invariants
> that all collations must satisfy: for example, if 
> compare(A,B)=-1, then
> compare(B,A)=+1.
> 
> Jim: I agree with Mike that this characteristic is highly 
> desirable, but I 
> believe that I have been told of applications in which 
> certain collations 
> might not hold that characteristic. 

I agree, it's a tricky area. Perhaps we should define a set of desirable
invariants, and require the registration of a collation to indicate if any
of the desirable invariants are not satisfied.
> 
> (g) The document needs to be clear whether the description of 
> a collation in
> the registry needs to be descriptive or prescriptive. Can I register a
> collation without specifying precise details of the algorithm 
> that it uses
> (that is, sufficient information to allow a third party to 
> implement the
> collation, with predictable and repeatable results)? There 
> are potential IPR
> issues here.
> 
> Jim: I think that one should be able to register such 
> collations, which (I 
> presume) means that my answer to the implied question is 
> "descriptive".

I agree.

Michael Kay
Received on Wednesday, 25 August 2004 08:29:54 UTC