guidelines 2.2 and BOM

Hi all,

here's a very rough draft for an addition to 2.2 in the guidelines re BOM.

Andrew

================================================


2.2 Specifying a page encoding

Use a Byte Ordder Mark (BOM) for utf-16 and utf-32

IE(Win) NNav Opera

The Byte Order Mark (BOM), U+FEFF, should occur at the beginning of
UTF-16 or UTF-32 encoded HTML/XHTML documents. Use of a BOM at the start
of a document clearly distinguishes the byte order used by the document.
It indicates whether the document is in either the big or little endian
format.

The Byte Order Mark should not begin a UTF-8 document. It is not
recquired as an encoding signature.

If you look at a UTF-16 document using a hex editor, the bytes
representing U+FEFF will clearly indicate the byte order of the encoding.

Bytes 		Document encoding
FF FE 		UTF-16, little-endian
FE FF 		UTF-16, big-endian
FF FE 00 00 	UTF-32, little-endian
00 00 FE FF 	UTF-32, big-endian

Use of the BOM will assist user agents in correctly identifying the
character encoding.


add to Sources:

Unicode in XML and other Markup Languages
http://www.w3.org/TR/unicode-xml/

Received on Wednesday, 23 July 2003 09:33:51 UTC