W3C home > Mailing lists > Public > ietf-charsets@w3.org > January to March 2003

RE: Proposed changes to UTF-8 draft

From: Francois Yergeau <FYergeau@alis.com>
Date: Mon, 13 Jan 2003 10:46:04 -0500
To: ietf-charsets@iana.org
Message-id: <F7D4BDA0E5A1D14B99D32C022AEB7366A5082C@alis-2k.alis.domain>

Keld Jrn Simonsen wrote:
> It is becacuse UTF-8 in the ISO 10646 definition only encodes 
> characters
> defined in 10646. And "surrogates" are not characters. So they "do not
> occur" in UTF-8. 

Yes, you're just repeating what the Note in Annex D says.  It's not wrong.
It's just insufficient: it's a Note (non-normative) and it does not forbid
(or even warn against) interpreting encoded surrogates.  Or overlong
sequences.  There is a section that describes certain error cases, but it
misses those two, thereby implying that they might not be errors.  The
Unicode 3.2 text is just much tighter (at long last!) and therefore should
be chosen.

-- 
Franois
Received on Monday, 13 January 2003 10:47:47 GMT

This archive was generated by hypermail 2.2.0 + w3c-0.30 : Monday, 12 September 2005 15:53:29 GMT