RE: UTF8/UTF16 from Jon Hanna on 2002-08-20 (w3c-wai-ig@w3.org from July to September 2002)

From: Jon Hanna <jon@spin.ie>
Date: Wed, 21 Aug 2002 00:25:15 +0100
To: <w3c-wai-ig@w3.org>
Message-ID: <NDBBLCBLIMDOPKMOPHLHKECJEFAA.jon@spin.ie>

> UTF8 uses a variable number of bytes, such that American can be
> represented
> in one byte, British requires two bytes, occasionally,

Some Americans do have good English you know! Perhaps you have been
over-exposed to American TV and underexposed to their great tradition of
short-story writing?

> For HTML, you can only legally use UTF16 if you include the charset
> parameter in the real HTTP headers, as meta elements can't be detected
> unless the character set is ASCII compatible.  I'm not sure about XML;
> it might recognize the Unicode byte order marks, used to signal UTF16.
> Some browsers may sniff out UTF16, even when the HTTP headers don't
> identify it.

All XML parsers can understand UTF-8 (and hence 8-bit encoded ASCII since it
is identical to the UTF-8 encoding of the same characters) and UTF-16. They
can all use the byte order mark to tell the byte-order of the UTF-16 and
they MAY carry out further heuristics to determine the byte-order in the
absence of a BOM.
If it doesn't do that it's not an XML parser; demand your money back if it's
commercial, demand your bandwidth back if it's freeware!

Received on Tuesday, 20 August 2002 19:23:36 UTC