guidelines 2.2 and BOM

Hi all,

here's a very rough draft for an addition to 2.2 in the guidelines re BOM.



2.2 Specifying a page encoding

Use a Byte Ordder Mark (BOM) for utf-16 and utf-32

IE(Win) NNav Opera

The Byte Order Mark (BOM), U+FEFF, should occur at the beginning of
UTF-16 or UTF-32 encoded HTML/XHTML documents. Use of a BOM at the start
of a document clearly distinguishes the byte order used by the document.
It indicates whether the document is in either the big or little endian

The Byte Order Mark should not begin a UTF-8 document. It is not
recquired as an encoding signature.

If you look at a UTF-16 document using a hex editor, the bytes
representing U+FEFF will clearly indicate the byte order of the encoding.

Bytes 		Document encoding
FF FE 		UTF-16, little-endian
FE FF 		UTF-16, big-endian
FF FE 00 00 	UTF-32, little-endian
00 00 FE FF 	UTF-32, big-endian

Use of the BOM will assist user agents in correctly identifying the
character encoding.

add to Sources:

Unicode in XML and other Markup Languages

Received on Wednesday, 23 July 2003 09:33:51 UTC