2.2 additions take 2

2.2 Specifying a page encoding

Ensure that a utf-16 or utf-32 document is saved with a BOM
--------------------------------------------------------------

IE(Win) NNav Opera

Use of the BOM will assist user agents in correctly identifying the 
character encoding.

The Byte Order Mark (BOM), U+FEFF, should occur at the beginning of 
UTF-16 or UTF-32 encoded XHTML/HTML documents. Use of a BOM at the start 
of a document clearly distinguishes the byte order used by the document. 
It indicates whether the document is in either the big or little endian 
format.

Many HTML editors and text editors will insert a BOM by default. Some 
editors allow you to customise this behaviour. It is important when you 
first start using UTF-16 or UTF-32 that you ensure that your editor and 
development tools do place a BOM at the beginning of a document.

If you look at a UTF-16 document using a hex editor, the bytes 
representing U+FEFF will clearly indicate the byte order of the encoding.

Bytes 		Document encoding
FF FE 		UTF-16, little-endian
FE FF 		UTF-16, big-endian
FF FE 00 00 	UTF-32, little-endian
00 00 FE FF 	UTF-32, big-endian



Ensure that a utf-8 document is saved without a BOM
------------------------------------------------------------

IE(Win) NNav Opera

The Byte Order Mark should not begin a UTF-8 document. It is not 
recquired as an encoding signature. In older web browsers it may 
adversely affect the rendering of the web page.

Some editors will insert a BOM at the beginning of a UTF-8 document, 
these editors usualy allow you to customise this behaviour. It is 
important when you first start using UTF-8 that you ensure that your 
editor and development tools do not place a BOM at the beginning of a 
document.



add to Sources:

Unicode in XML and other Markup Languages
http://www.w3.org/TR/unicode-xml/

Received on Wednesday, 30 July 2003 14:49:45 UTC