RE: Created tracking/issues page for draft-newman-i18n-comparator from Jim Melton on 2004-08-24 (public-ietf-collation@w3.org from August 2004)

From: Jim Melton <jim.melton@acm.org>
Date: Tue, 24 Aug 2004 17:36:34 -0600
To: public-ietf-collation@w3.org
Cc: jim.melton@acm.org
Message-Id: <6.0.0.22.2.20040824171636.04bee9c0@gmstimap.oraclecorp.com>
Gentlepeople,

On 2004-08-24, Mike Kay said the following, on which I would like to 
comment; my comments are preceded by "Jim:":

Some comments on the draft:

(a) I think we should be defining a function on character strings, not on
octet strings. The encoding of the strings is a matter for the protocol to
negotiate, and the protocol should be out of scope for this document. (By
"character string", I mean a list of integers being the Unicode codepoints).

Jim: I mostly agree with Mike on this one.  The "mostly" involves my 
understanding (or not) of Mike's use of "Unicode codepoints".  It is my 
understanding of Unicode that the phrase explicitly excludes surrogate 
values, requiring direct use of codepoints of the characters that would be 
referenced by surrogate pairs.  WIth that understanding, then Mike and I 
agree.

(b) Because the protocol is out of scope, discussions of how to search for a
collation using wildcards are also out of scope.

Jim: I don't understand this point.  Surely, wildcards that might be used 
to search for collations are no less applicable when the collations are 
defined on character strings.  (I recognize that the existing draft used 
"single wildcard" to search for collations that differ only in the 
protocol, but that is not the only possible use.)

(c) Typically a collation is the combination of a basic algorithm (such as
UCA) and a set of parameters (such as language, collation strength). This
doesn't fit well into a single-level naming structure. It would be better to
identify a collation by means of a collation algorithm name supplemented by
a set of keyword/value pairs. A registered collation would define the
keywords that are recognized and the allowed values for each keyword.

Jim: I understand Mike's point, but I prefer that a single "name" 
(actually, a URI, at least in the XQuery context) identify a single 
collation.  If the URI has some sort of internal structure, which URLs 
certainly may, then I think both of our requirements might be solved.  What 
I specifically do not want is to have to supply anything more complicated 
than a single character string to uniquely identify the exact (most 
specific) collation to be used for a given operation.

(d) The document is trying both to describe the registry and to give it some
initial content. These two things should be separated.

Jim: No disagreement, but also no really strong position.

(e) The description of a registry needs to include some process definitions
for how the registry is maintained.

Jim: Absolutely.

(f) The definition of the term "collation" should include some invariants
that all collations must satisfy: for example, if compare(A,B)=-1, then
compare(B,A)=+1.

Jim: I agree with Mike that this characteristic is highly desirable, but I 
believe that I have been told of applications in which certain collations 
might not hold that characteristic.  I cannot produce any such application 
or requirement at the moment, but want to urge others to think about the 
possibility.

(g) The document needs to be clear whether the description of a collation in
the registry needs to be descriptive or prescriptive. Can I register a
collation without specifying precise details of the algorithm that it uses
(that is, sufficient information to allow a third party to implement the
collation, with predictable and repeatable results)? There are potential IPR
issues here.

Jim: I think that one should be able to register such collations, which (I 
presume) means that my answer to the implied question is "descriptive".

Hope this helps,
    Jim

========================================================================
Jim Melton --- Editor of ISO/IEC 9075-* (SQL)     Phone: +1.801.942.0144
   Editor of XQuery F&O, XQueryX, etc.              Fax : +1.801.942.3345
Oracle Corporation        Oracle Email: jim dot melton at oracle dot com
1930 Viscounti Drive      Standards email: jim dot melton at acm dot org
Sandy, UT 84093-1063 USA          Personal email: jim at melton dot name
========================================================================
=  Facts are facts.   But any opinions expressed are the opinions      =
=  only of myself and may or may not reflect the opinions of anybody   =
=  else with whom I may or may not have discussed the issues at hand.  =
========================================================================
Received on Tuesday, 24 August 2004 23:38:34 UTC