Re: draft-newman-i18n-comparator: issue chars-or-octets-06 from Arnt Gulbrandsen on 2004-10-19 (public-ietf-collation@w3.org from October 2004)

From: Arnt Gulbrandsen <arnt@gulbrandsen.priv.no>
Date: Tue, 19 Oct 2004 10:57:28 +0200
To: Martin Duerst <duerst@w3.org>
Cc: public-ietf-collation@w3.org, Michael Kay <mhk@mhk.me.uk>
Message-Id: <w49uuJyyXXYZ7O68qJL6zA.md5@prosecco.oryx.com>

Martin Duerst writes:
> I'm wondering how important the possibility to register additional 
> 'octet' collations is; there is currently only one, and it's 
> difficult to imagine needing more (unless we would want one for 
> UTF-16).

Not for UTF-16. UTF-16 doesn't even use octets, it uses, uh, 
hexadectets? My Greek isn't too hot, sorry. 16-bit thingammajiggies 
anyway. Byte order varies, so an octet-based collation won't get a good 
grip on UTF-16.

I think exactly one octet-based collation is necessary. It covers the 
case where only one thing known about the octets is that they're 
octets. One octet-based collation is necessarily sufficient to cover 
this case.

If anything more is known, whatever it is, then I think the data are no 
longer adequately described as octets. Saying "character" makes more 
sense.

Arnt

Received on Tuesday, 19 October 2004 08:57:46 UTC