- From: Keld Jørn Simonsen <keld@dkuug.dk>
- Date: Mon, 13 Jan 2003 18:54:55 +0100
- To: Francois Yergeau <FYergeau@alis.com>
- Cc: ietf-charsets@iana.org
On Mon, Jan 13, 2003 at 10:46:04AM -0500, Francois Yergeau wrote: > Keld Jørn Simonsen wrote: > > It is becacuse UTF-8 in the ISO 10646 definition only encodes > > characters > > defined in 10646. And "surrogates" are not characters. So they "do not > > occur" in UTF-8. > > Yes, you're just repeating what the Note in Annex D says. It's not wrong. > It's just insufficient: it's a Note (non-normative) and it does not forbid > (or even warn against) interpreting encoded surrogates. Or overlong > sequences. There is a section that describes certain error cases, but it > misses those two, thereby implying that they might not be errors. The > Unicode 3.2 text is just much tighter (at long last!) and therefore should > be chosen. That is not how I read it, the note explains what is obvious from the architecture, to the reader, that you cannot encode surrogates in utf-8. It does not, however, warn against overlong sequences, that is true. Kind regards keld
Received on Monday, 13 January 2003 12:56:04 UTC