RE: UTF8/UTF16

> UTF8 uses a variable number of bytes, such that American can be
> represented
> in one byte, British requires two bytes, occasionally,

Some Americans do have good English you know! Perhaps you have been
over-exposed to American TV and underexposed to their great tradition of
short-story writing?

> For HTML, you can only legally use UTF16 if you include the charset
> parameter in the real HTTP headers, as meta elements can't be detected
> unless the character set is ASCII compatible.  I'm not sure about XML;
> it might recognize the Unicode byte order marks, used to signal UTF16.
> Some browsers may sniff out UTF16, even when the HTTP headers don't
> identify it.

All XML parsers can understand UTF-8 (and hence 8-bit encoded ASCII since it
is identical to the UTF-8 encoding of the same characters) and UTF-16. They
can all use the byte order mark to tell the byte-order of the UTF-16 and
they MAY carry out further heuristics to determine the byte-order in the
absence of a BOM.
If it doesn't do that it's not an XML parser; demand your money back if it's
commercial, demand your bandwidth back if it's freeware!

Received on Tuesday, 20 August 2002 19:23:36 UTC