RE: Proposed changes to UTF-8 draft from Francois Yergeau on 2003-01-10 (ietf-charsets@w3.org from January to March 2003)

From: Francois Yergeau <FYergeau@alis.com>
Date: Fri, 10 Jan 2003 14:47:53 -0500
To: ietf-charsets@iana.org
Message-id: <F7D4BDA0E5A1D14B99D32C022AEB7366A507D2@alis-2k.alis.domain>

Keld Jørn Simonsen wrote:
> I think we should keep ourselves to open standards whenever possible,
> and avoid industry standards like Unicode if we can.

I dispute the characterization of ISO standards as open.  The
standardization process is totally closed (only National Bodies can play)
and the standards themselves, with few exceptions not including 10646, are
available only for money.

> 10646 is pretty explicit about not using surrogates in UTF-8,
> as far as I know. Always was.

Please re-read Annex D.  The only mention is this Note:

  NOTE 1 - Values of x in the range 0000 D800 .. 0000 DFFF
  are reserved for the UTF-16 form and do not occur in UCS-4.
  The values 0000 FFFE and 0000 FFFF also do not occur
  (see clause 8). The mappings of these code positions in
  UTF-8 are undefined.

There's a later section D.7 "Incorrect sequences of octets: Interpretation
by receiving devices" which is totally silent on decoding surrogates and
overlong sequences.

-- 
François

Received on Friday, 10 January 2003 14:48:37 UTC