- From: Jim Melton <jim.melton@acm.org>
- Date: Tue, 24 Aug 2004 17:36:34 -0600
- To: public-ietf-collation@w3.org
- Cc: jim.melton@acm.org
Gentlepeople, On 2004-08-24, Mike Kay said the following, on which I would like to comment; my comments are preceded by "Jim:": Some comments on the draft: (a) I think we should be defining a function on character strings, not on octet strings. The encoding of the strings is a matter for the protocol to negotiate, and the protocol should be out of scope for this document. (By "character string", I mean a list of integers being the Unicode codepoints). Jim: I mostly agree with Mike on this one. The "mostly" involves my understanding (or not) of Mike's use of "Unicode codepoints". It is my understanding of Unicode that the phrase explicitly excludes surrogate values, requiring direct use of codepoints of the characters that would be referenced by surrogate pairs. WIth that understanding, then Mike and I agree. (b) Because the protocol is out of scope, discussions of how to search for a collation using wildcards are also out of scope. Jim: I don't understand this point. Surely, wildcards that might be used to search for collations are no less applicable when the collations are defined on character strings. (I recognize that the existing draft used "single wildcard" to search for collations that differ only in the protocol, but that is not the only possible use.) (c) Typically a collation is the combination of a basic algorithm (such as UCA) and a set of parameters (such as language, collation strength). This doesn't fit well into a single-level naming structure. It would be better to identify a collation by means of a collation algorithm name supplemented by a set of keyword/value pairs. A registered collation would define the keywords that are recognized and the allowed values for each keyword. Jim: I understand Mike's point, but I prefer that a single "name" (actually, a URI, at least in the XQuery context) identify a single collation. If the URI has some sort of internal structure, which URLs certainly may, then I think both of our requirements might be solved. What I specifically do not want is to have to supply anything more complicated than a single character string to uniquely identify the exact (most specific) collation to be used for a given operation. (d) The document is trying both to describe the registry and to give it some initial content. These two things should be separated. Jim: No disagreement, but also no really strong position. (e) The description of a registry needs to include some process definitions for how the registry is maintained. Jim: Absolutely. (f) The definition of the term "collation" should include some invariants that all collations must satisfy: for example, if compare(A,B)=-1, then compare(B,A)=+1. Jim: I agree with Mike that this characteristic is highly desirable, but I believe that I have been told of applications in which certain collations might not hold that characteristic. I cannot produce any such application or requirement at the moment, but want to urge others to think about the possibility. (g) The document needs to be clear whether the description of a collation in the registry needs to be descriptive or prescriptive. Can I register a collation without specifying precise details of the algorithm that it uses (that is, sufficient information to allow a third party to implement the collation, with predictable and repeatable results)? There are potential IPR issues here. Jim: I think that one should be able to register such collations, which (I presume) means that my answer to the implied question is "descriptive". Hope this helps, Jim ======================================================================== Jim Melton --- Editor of ISO/IEC 9075-* (SQL) Phone: +1.801.942.0144 Editor of XQuery F&O, XQueryX, etc. Fax : +1.801.942.3345 Oracle Corporation Oracle Email: jim dot melton at oracle dot com 1930 Viscounti Drive Standards email: jim dot melton at acm dot org Sandy, UT 84093-1063 USA Personal email: jim at melton dot name ======================================================================== = Facts are facts. But any opinions expressed are the opinions = = only of myself and may or may not reflect the opinions of anybody = = else with whom I may or may not have discussed the issues at hand. = ========================================================================
Received on Tuesday, 24 August 2004 23:38:34 UTC