- From: Jim Melton <jim.melton@acm.org>
- Date: Tue, 24 Aug 2004 17:36:34 -0600
- To: public-ietf-collation@w3.org
- Cc: jim.melton@acm.org
Gentlepeople,
On 2004-08-24, Mike Kay said the following, on which I would like to
comment; my comments are preceded by "Jim:":
Some comments on the draft:
(a) I think we should be defining a function on character strings, not on
octet strings. The encoding of the strings is a matter for the protocol to
negotiate, and the protocol should be out of scope for this document. (By
"character string", I mean a list of integers being the Unicode codepoints).
Jim: I mostly agree with Mike on this one. The "mostly" involves my
understanding (or not) of Mike's use of "Unicode codepoints". It is my
understanding of Unicode that the phrase explicitly excludes surrogate
values, requiring direct use of codepoints of the characters that would be
referenced by surrogate pairs. WIth that understanding, then Mike and I
agree.
(b) Because the protocol is out of scope, discussions of how to search for a
collation using wildcards are also out of scope.
Jim: I don't understand this point. Surely, wildcards that might be used
to search for collations are no less applicable when the collations are
defined on character strings. (I recognize that the existing draft used
"single wildcard" to search for collations that differ only in the
protocol, but that is not the only possible use.)
(c) Typically a collation is the combination of a basic algorithm (such as
UCA) and a set of parameters (such as language, collation strength). This
doesn't fit well into a single-level naming structure. It would be better to
identify a collation by means of a collation algorithm name supplemented by
a set of keyword/value pairs. A registered collation would define the
keywords that are recognized and the allowed values for each keyword.
Jim: I understand Mike's point, but I prefer that a single "name"
(actually, a URI, at least in the XQuery context) identify a single
collation. If the URI has some sort of internal structure, which URLs
certainly may, then I think both of our requirements might be solved. What
I specifically do not want is to have to supply anything more complicated
than a single character string to uniquely identify the exact (most
specific) collation to be used for a given operation.
(d) The document is trying both to describe the registry and to give it some
initial content. These two things should be separated.
Jim: No disagreement, but also no really strong position.
(e) The description of a registry needs to include some process definitions
for how the registry is maintained.
Jim: Absolutely.
(f) The definition of the term "collation" should include some invariants
that all collations must satisfy: for example, if compare(A,B)=-1, then
compare(B,A)=+1.
Jim: I agree with Mike that this characteristic is highly desirable, but I
believe that I have been told of applications in which certain collations
might not hold that characteristic. I cannot produce any such application
or requirement at the moment, but want to urge others to think about the
possibility.
(g) The document needs to be clear whether the description of a collation in
the registry needs to be descriptive or prescriptive. Can I register a
collation without specifying precise details of the algorithm that it uses
(that is, sufficient information to allow a third party to implement the
collation, with predictable and repeatable results)? There are potential IPR
issues here.
Jim: I think that one should be able to register such collations, which (I
presume) means that my answer to the implied question is "descriptive".
Hope this helps,
Jim
========================================================================
Jim Melton --- Editor of ISO/IEC 9075-* (SQL) Phone: +1.801.942.0144
Editor of XQuery F&O, XQueryX, etc. Fax : +1.801.942.3345
Oracle Corporation Oracle Email: jim dot melton at oracle dot com
1930 Viscounti Drive Standards email: jim dot melton at acm dot org
Sandy, UT 84093-1063 USA Personal email: jim at melton dot name
========================================================================
= Facts are facts. But any opinions expressed are the opinions =
= only of myself and may or may not reflect the opinions of anybody =
= else with whom I may or may not have discussed the issues at hand. =
========================================================================
Received on Tuesday, 24 August 2004 23:38:34 UTC