W3C home > Mailing lists > Public > public-ietf-collation@w3.org > September 2005

Re: comments on draft-newman-i18n-comparator-05.txt

From: Arnt Gulbrandsen <arnt@gulbrandsen.priv.no>
Date: Thu, 22 Sep 2005 10:24:32 +0200
Message-Id: </AlEUqj9pKntMV6tySbn4Q.md5@libertango.oryx.com>
To: public-ietf-collation@w3.org
Cc: Philip Guenther <guenther+collation@sendmail.com>


Two things. First, we discussed the collators in the IMAPEXT WG in Paris 
in August. The draft is not moving and lots of IMAP work is blocked on 
it, so it was easy for the WG chair to hold people hostage: «If we 
don't work to get collators out as RFC, this group's work is blocked 
forever.» She's right: An RFC has been ready to publish since December 
2003, it's just waiting for one definition in the collator draft.

So, to answer Philip.

I read the draft and pondered your confusion, but I didn't really 
understand until Cyrus talked about date collation. Thank you for 
uncovering this.

1. Collators should get octet strings from the protocol.

2. Collators operate on a collator-specified type. Those (most?) 
collators which operate on character strings have to convert the octet 
string to a character string. (For example, a collator which operates 
on unicode strings has to decode UTF-8 before it can sort.)

Some collators don't operate on character strings. Ascii-numeric is a 
case in point. Those have to parse the parse the octet string and work 
on the resulting value.

Cyrus Daboo mentioned a collator which sorts dates. That collator has to 
specify a date format (perhaps by reference), parse that format, and 
sort/compare the dates in its internal format.

The ascii-numeric collator needs rewriting so it speaks of numeric 
comparison, rather than digit strings. No logical change, just a change 
of wording to emphasise the numeric nature of the objects more than the 
ASCII representation. (I'll specify unbounded integers. Not 32-bit, not 

3. Any implementation is of course free to optimise. This is about the 
specification of collators only.

I'll rewrite the draft to improve ascii-numeric, describe the split, 
specify what happens when the octet string doesn't follow the 
collator's expected format or isn't within the collator's domain, then 
republish as draft-ietf-imapext-collators-00.txt. Lisa Dusseault 
(IMAPEXT WG chair) will coerce reviews and issue a WGLC soonish (weeks, 
not months). All reviews welcome.

Received on Thursday, 22 September 2005 08:29:20 UTC

This archive was generated by hypermail 2.3.1 : Tuesday, 6 January 2015 19:38:40 UTC