Re: Registration of new charset: UTF-32 from Markus Scherer on 2001-05-11 (ietf-charsets@w3.org from April to June 2001)

From: Markus Scherer <markus.scherer@jtcsv.com>
Date: Fri, 11 May 2001 11:11:23 -0700
To: charsets <ietf-charsets@iana.org>
Message-id: <3AFC2B4B.422B926B@jtcsv.com>

Small correction:

Mark Davis wrote:
> initial BOM then the byte-orientation must be big-endian. That is, in any
> stream that does not begin with the (hex) byte sequence <00 00 FE FF> all of
> the bytes are interpreted as big-endian.

This must read:
... in any stream that does not begin with the (hex) byte sequence <FF FE 00 00> all of the bytes are interpreted as big-endian.

Explanation: Mark had the BE BOM in his sentence. It must be "everything that does not begin with the LE BOM is big-endian".

One might add a note that UTF-32 with a little-endian BOM could appear to have a UTF-16 LE BOM because that is a subset. The UTF-32 registration might specify that <FF FE 00 00> is UTF-32 (little-endian), and that <FF FE xx xx> with not both xx bytes 00 is UTF-16 (little-endian).

markus

Received on Friday, 11 May 2001 14:10:55 UTC