- From: Francois Yergeau <FYergeau@alis.com>
- Date: Mon, 13 Jan 2003 10:46:04 -0500
- To: ietf-charsets@iana.org
Keld Jørn Simonsen wrote: > It is becacuse UTF-8 in the ISO 10646 definition only encodes > characters > defined in 10646. And "surrogates" are not characters. So they "do not > occur" in UTF-8. Yes, you're just repeating what the Note in Annex D says. It's not wrong. It's just insufficient: it's a Note (non-normative) and it does not forbid (or even warn against) interpreting encoded surrogates. Or overlong sequences. There is a section that describes certain error cases, but it misses those two, thereby implying that they might not be errors. The Unicode 3.2 text is just much tighter (at long last!) and therefore should be chosen. -- François
Received on Monday, 13 January 2003 10:47:47 UTC