RE: Proposed changes to UTF-8 draft from Francois Yergeau on 2003-01-13 (ietf-charsets@w3.org from January to March 2003)

From: Francois Yergeau <FYergeau@alis.com>
Date: Mon, 13 Jan 2003 10:46:04 -0500
To: ietf-charsets@iana.org
Message-id: <F7D4BDA0E5A1D14B99D32C022AEB7366A5082C@alis-2k.alis.domain>

Keld Jørn Simonsen wrote:
> It is becacuse UTF-8 in the ISO 10646 definition only encodes 
> characters
> defined in 10646. And "surrogates" are not characters. So they "do not
> occur" in UTF-8. 

Yes, you're just repeating what the Note in Annex D says.  It's not wrong.
It's just insufficient: it's a Note (non-normative) and it does not forbid
(or even warn against) interpreting encoded surrogates.  Or overlong
sequences.  There is a section that describes certain error cases, but it
misses those two, thereby implying that they might not be errors.  The
Unicode 3.2 text is just much tighter (at long last!) and therefore should
be chosen.

-- 
François

Received on Monday, 13 January 2003 10:47:47 UTC