draft-newman-i18n-comparator: issue chars-or-octets-06 from Martin Duerst on 2004-10-19 (public-ietf-collation@w3.org from October 2004)

From: Martin Duerst <duerst@w3.org>
Date: Tue, 19 Oct 2004 15:48:52 +0900
To: "Michael Kay" <mhk@mhk.me.uk>, <public-ietf-collation@w3.org>
Message-Id: <6.0.0.20.2.20041019153411.04d16400@localhost>

Hello Michael, Jim,

Many thanks for your comments. Sorry to be late with responding.

I'm going to reply to your mails separately for each issue.

At 19:11 04/08/24, Michael Kay wrote:
 >
 >Some comments on the draft:
 >
 >(a) I think we should be defining a function on character strings, not on
 >octet strings. The encoding of the strings is a matter for the protocol to
 >negotiate, and the protocol should be out of scope for this document. (By
 >"character string", I mean a list of integers being the Unicode codepoints).

I have assigned this issue chars-or-octets-06
(http://www.w3.org/2004/08/ietf-collation#chars-or-octets-06).

It seems that some protocols (ACAP? experts, please help) are using
UTF-8 only, but then they also want to have a 'binary' sort order,
for cases where the UTF-8 data might not be clean (or maybe also for
binary data?).

In my view, there is quite a bit of description in the draft that
can be cleaned up, and I'll work on that. Overall, the draft should
probably leave open the possibility of using an octet collation,
but a protocol should expliticly have to declare that it is using
that, the default should be characters only.

I'm wondering how important the possibility to register additional
'octet' collations is; there is currently only one, and it's difficult
to imagine needing more (unless we would want one for UTF-16).
So most probably, we can declare the 'octet collation' to be a
very special, one-of-a-kind case that has to be explicitly invoked.

Regards,     Martin.

Received on Tuesday, 19 October 2004 06:54:07 UTC