F&O comments: collations, code points, and comparisons

Mostly editorial comments on the F&O Nov 15 draft (these also still
apply to the internal Dec 10 draft; section numbers refer to the Dec 10
draft for your convenience).


- The link on the internal group page to the latest internal F&O draft
is wrong (points to the July 15 draft instead of the Dec 10 draft).

- Don't use "codepoint", use "code point".  Both the W3C Character Model
and the Unicode Consortium use "code point" in all their docs (as far as
I can tell).  [Also, codepoint gets a red squiggle, and I refuse to add
it to my spellchecker's dictionary :-)]

- 6.3: The "Unicode codepoint collation" is named but not defined
anywhere.  Confusingly, it's introduced around the same time as the
Unicode Collation Algorithm (which is unused by XQuery).

- 6.3.1: The definition of compare() explains what happens when one
string differs in length from the other; but this should be up to the
collation.

- 6.4.6, 6.4.7, 6.4.14: Surrogate pairs are irrelevant.  You've already
defined things in terms of code points -- so the underlying bytes (and
therefore, surrogate pairs) never come into play.

- 9.2.1, 10.2.1, 12.1.1: should all compare according to the context
collation

- 6.3, etc.: As Jeni Tennison already brought up [1], URIs as collation
names are unusual (and not even followed by the draft itself).  Although
the idea has merit for WS-I, almost every collation implementation I can
find uses RFC 1766 (locale names like en-US and fr-FR).  Perhaps some
implementations will invent a URI syntax for their collations, but I
expect most Java and .NET implementations will rely on
java.text.RuleBasedCollation and System.Globalization.CultureInfo, both
of which are based on RFC 1766.  If you're going to insist on URIs, then
at least make the draft examples consistent with that.

- Speaking of Jeni's prior feedback, I'd like to echo the request for
title-case().  Aside from newspapers and poems, I think customers really
want it -- I see a ton of Java and .NET questions about title case [2,
3].  I think I said this yesterday, but it seems arbitrary omit
title-case when you're already implementing most of the rest of the
Unicode Case Mapping.  (And it shouldn't be a big implementation burden
-- both Java and .NET provide it in their class library/frameworks.)


Cheers,

Michael Brundage
xquery@attbi.com


[1]
http://lists.w3.org/Archives/Public/public-qt-comments/2002May/0052.html
[2] http://www.google.com/search?q=%22title+case%22+Java+%22how+do%22
[3] http://www.google.com/search?q=%22title+case%22+NET+%22how+do%22

Received on Tuesday, 10 December 2002 22:24:29 UTC