W3C home > Mailing lists > Public > ietf-charsets@w3.org > January to March 2003

RE: Proposed changes to UTF-8 draft

From: Francois Yergeau <FYergeau@alis.com>
Date: Fri, 10 Jan 2003 14:47:53 -0500
To: ietf-charsets@iana.org
Message-id: <F7D4BDA0E5A1D14B99D32C022AEB7366A507D2@alis-2k.alis.domain>

Keld J°rn Simonsen wrote:
> I think we should keep ourselves to open standards whenever possible,
> and avoid industry standards like Unicode if we can.

I dispute the characterization of ISO standards as open.  The
standardization process is totally closed (only National Bodies can play)
and the standards themselves, with few exceptions not including 10646, are
available only for money.
 
> 10646 is pretty explicit about not using surrogates in UTF-8,
> as far as I know. Always was.

Please re-read Annex D.  The only mention is this Note:

  NOTE 1 - Values of x in the range 0000 D800 .. 0000 DFFF
  are reserved for the UTF-16 form and do not occur in UCS-4.
  The values 0000 FFFE and 0000 FFFF also do not occur
  (see clause 8). The mappings of these code positions in
  UTF-8 are undefined.

There's a later section D.7 "Incorrect sequences of octets: Interpretation
by receiving devices" which is totally silent on decoding surrogates and
overlong sequences.

-- 
Franšois
Received on Friday, 10 January 2003 14:48:37 GMT

This archive was generated by hypermail 2.2.0+W3C-0.50 : Monday, 5 June 2006 15:10:56 GMT