W3C home > Mailing lists > Public > public-ietf-collation@w3.org > September 2005

Re: comments on draft-newman-i18n-comparator-05.txt

From: Arnt Gulbrandsen <arnt@gulbrandsen.priv.no>
Date: Thu, 22 Sep 2005 17:34:32 +0200
Message-Id: <+wyNiSfvvnsddts2VDusbg.md5@libertango.oryx.com>
To: Mark Davis <mark.davis@icu-project.org>
Cc: Philip Guenther <guenther+collation@sendmail.com>, Martin Duerst <duerst@it.aoyama.ac.jp>, public-ietf-collation@w3.org

Mark Davis writes:
> The goal and work so far is good. I'll need to read the document over 
> more carefully, but one quick point. The specification should make 
> very sure that some formal properties are observed.

Yes, but which? I didn't add any on my watch, because I felt 
uncomfortable establishing new requirements on running code without 
understanding all of that code.

To illustrate, your suggested list contains one item which I know is 
problematic:

> Matching MUST be defined such that if there is a match, the substring 
> meets the equality criteria. Note: there are some real gotchas in 
> matching, see http://www.unicode.org/reports/tr10/#Searching

If you ask an IMAP server to search for messages «FROM "<mark.davis@"», 
a message which contains «From: mark.davis@icu-project.org (Mark 
Davis)» may very well match.

I'll add them, though. Some I'll add in the main specification, some 
I'll add in one or more defined collators, and maybe I'll flag some as 
«discussion needed».

There is one thing I'm considering relaxing: When sorting, a collator 
need not leave malformed items in any particular order. That is, when 
sorting ten items, who of which are malformed, both malformed items 
must be at the end, but not in any particular order. I haven't quite 
made up my mind on that. Perhaps a stable sort should leave malformed 
in their original order, and unstable sorts can do what they want.

Arnt
Received on Thursday, 22 September 2005 15:39:36 GMT

This archive was generated by hypermail 2.2.0+W3C-0.50 : Tuesday, 8 January 2008 14:12:54 GMT