<?php 
// AUTHORS should fill in these assignments:
$directory = 'questions/'; // the directory path below /International up to but not including the file name: must end in a slash! 
$filename = 'qa-byte-order-mark'; // the file name WITHOUT extensions
$authors = 'Richard Ishida, W3C'; // author(s) and affiliations
$modifiers = ''; // people making substantive changes, and their affiliation
$searchString = 'qa-byte-order-mark'; // blog search string - usually the filename without extensions
$firstPubDate = '2010-08-10  14:52'; // date of the first publication of the document (after review)
$lastSubstUpdate = '2010-08-10  14:52';  // date of latest substantive changes to this document
$pathtophp = '../php'; // authors should check that the following points to /International/php - must be relative path
$outOfDateTranslation = 'no';

// AUTHORS AND TRANSLATORS should fill in these assignments:
$clang = 'en'; // the language extension for articles in this language (use 'en' for English)
$isTranslation = 'no';  // set to 'yes' if this is a translation !
$copyrightYear = '2010'; // this year, but may also be a range, eg. 2002-2006
$thisVersion = '2010-09-09  12:57'; // date of latest edits to this document/translation

// TRANSLATORS should fill in these assignments:
$translators = 'xxxNAME, ORG'; // translator(s) and their affiliation - a elements allowed, but use double quotes for attributes
$enVersion = 'xxxYYYY-MM-DD';  // date of the English original on which the translation is based (see last substantive change date at bottom of file)

include($pathtophp.'/bp3/boilerplate-'.$clang.'.php');

if (! isset($s_articles)) { $s_articles = "Articles"; }
$breadcrumbs = <<<eot
<a href='/International/'>$s_home </a> &gt; <a href='/International/resources'>$s_resources</a> &gt; <a href='/International/articlelist#characters'>$s_articles</a>
eot;

$toc = <<<eot
<ol>
<li><a href="#question">$s_questionLink</a></li>
<li><a href="#answer">$s_answerLink</a>
	<ol>
	<li><a href="#bomwhat">What is a byte-order mark?</a></li>
	<li><a href="#bomhow">What do I need to know about the BOM?</a></li>
	</ol></li>
<li><a href="#bytheway">$s_byTheWayLink</a></li>
<li><a href="#endlinks">$s_furtherReadingLink</a></li>
</ol>
eot;

$additionalLinks = <<<eot
<h2>Quick check</h2>
<form action="http://qa-dev.w3.org/i18n-checker/index" method="get" class="quickcheck"><p>Check for byte-order marks in a page</p><p><input type="text" value="URI of page to check" name="docAddr" onfocus="this.value=''" /></p><p><button type="submit">Check</button></p><p><span class="guide">Look in the "Character encoding" area of the Information table. If the page has non-initial BOMs there will be a warning message lower down.</span></p></form>
eot;
include($pathtophp.'/bp3/structure.php');
?>
<!DOCTYPE html PUBLIC "-//W3C//DTD XHTML 1.0 Transitional//EN" "http://www.w3.org/TR/xhtml1/DTD/xhtml1-transitional.dtd">
<html <?php echo "lang='$clang' xml:lang='$clang'";?> xmlns="http://www.w3.org/1999/xhtml">
<head>
		<meta http-equiv="Content-Type" content="text/html; charset=utf-8" />
		<title>The byte-order mark (BOM) in HTML</title>
		<meta name="keywords"
		 content="i18n internationalisation internationalization localisation localization translation character encoding unicode utf-8 byte-order mark BOM" />
		<meta name="description" content="What is the byte-order mark, and what do I need to know about it when creating HTML?" />
<?php echo $headincludes;?>

<link rel="stylesheet" href="/International/style/article-standards-v2.css" type="text/css" />
</head>

	<body>
		<span id="version-info" style="display: none;"><!-- #BeginDate format:IS1m -->2010-10-07  19:59<!-- #EndDate --></span> <?php echo $topOfPage; ?>

		<h1>The byte-order mark (BOM) in HTML</h1>

		<div class="section"><a id="contentstart" name="contentstart" tabindex="1"></a> 
			<div id="audience"> 
				<p><?php echo $intendedAudience?>  XHTML/HTML coders (using editors or scripting), script developers (PHP, JSP, etc.), CSS coders, Web project managers, and anyone who  needs to better understand what the BOM is, and how it affects HTML.</p>
			<?php echo $updated; ?>
			</div>


			<h2><?php echo $questionHead?></h2>
			<div class="section2"> 
				<p class="question">What is the byte-order mark, and what do I need to know about it when creating HTML?</p>
			</div>
		</div>
		
	<div class="section"> 

		<h2><?php echo $answerHead?></h2>
		<div class="section2">
			<h3><a id="bomwhat" name="bomwhat" href="#bomwhat"> What is a byte-order mark?</a></h3>
			<p>At the beginning of a Unicode file you may find some bytes that represent the Unicode code point U+FEFF ZERO WIDTH NON-BREAKING SPACE (ZWNBSP). This combination of bytes is known as a <dfn>byte-order mark (BOM)</dfn>. </p>
			<p>When a character is encoded in UTF-16, its 2 or 4 bytes can be ordered in two different ways (<span class="qterm">little-endian</span> or <span class="qterm">big-endian</span>). The picture below illustrates this. The byte-order mark indicates which order is used, so that applications can immediately decode the content.  UTF-16 content should always begin with the BOM.</p>
			<p><img src="images/bom.png" alt="Bytes representing the BOM." /></p>
			<p style="">In the UTF-8 encoding, the presence of the BOM is not essential because, unlike the UTF-16 encodings, there is no
				alternative sequence of bytes in a character. The BOM may still occur in UTF-8 encoded text, however, either as a by-product of an encoding
				conversion or because it was added by an editor. In this situation, the BOM is often called the UTF-8 signature.</p>
		</div>
		<div class="section2">
			<h3><a id="bomhow" name="bomhow" href="#bomhow"> What do I need to know about the BOM?</a></h3>
			<p style="">When the BOM is used in web pages or editors for UTF-8 encoded content it can sometimes introduce blank spaces or short sequences of strange-looking characters (such as ï»¿). For this reason, it is usually best for interoperability to omit the BOM, when given a choice, for UTF-8 content.</p>
			<p style="">For more information about how to detect and remove a byte-order mark, see <a href="/International/questions/qa-utf8-bom"><cite>Display problems caused by the UTF-8 BOM</cite></a>. You can find out whether a page contains a BOM at the start or further down in the content using the <a href="http://qa-dev.w3.org/i18n-checker/">W3C Internationalization Checker</a>.</p>
			<p style="">If your editor allows you to specify whether you want a BOM while saving content as UTF-8, you should usually say no.</p>
			<p style=""><img src="images/dwprefs-bom.png" alt="BOM preferences on a dialog panel." /></p>
			<div class="sidenoteGroup">
				<p><b class="leadin">If you use UTF-16.</b> According to the HTML5 specification, if your page is encoded as UTF-16, you must use the byte-order mark in HTML. This is what will be used to indicate the encoding of the page to the browser.</p>
				<div class="sidenote">It's recommended to use UTF-8, rather than UTF-16, if you  use a Unicode encoding. So for most people, this will be academic.
					<p>&nbsp;</p>
				</div>
			</div>
			<p style="">The HTML5 specification currently disallows the use of any other in-document encoding declaration for UTF-16, although this is still under discussion and may change. In effect, this means that the BOM is, itself, the declaration that you have to add.</p>
			<div class="sidenoteGroup">
				<p>The requirement to use a BOM for UTF-16 encoded content in HTML5 means that you should not, however, serve HTML5 documents labeled as &quot;UTF16BE&quot; or &quot;UTF16LE&quot;. This is because the Unicode Standard says you should not use a BOM when the text is labeled as one of those encodings. If, therefore, you want to declare the encoding in the HTTP header (which is not disallowed by the HTML5 spec), you should only use the IANA charset name &quot;UTF-16&quot;.</p>
				<div class="sidenote">Note that this is solely about the <em>labeling</em> of the content.  Of course, the actual sequence of bytes is the same, whether you label content as UTF-16 and add a BOM, or whether you label it as UTF16LE or UTF-16BE.
					<p>&nbsp;</p>
				</div>
			</div>
		</div>
	</div>
	<div class="section">
		<h2><?php echo $btwHead?></h2>
		<p>The byte-order mark is also used for text labeled as UTF-32, and should not be used for text labeled as UTF-32BE or UTF-32LE. The use of UTF-32 for HTML content, however, is strongly discouraged, so we haven't mentioned it until now.</p>
</div>
	<?php echo $survey;?>
	<div class="section noprint">
		<h2><?php echo $readingHead?></h2>
		<ul id="full-links">
			<li>
				<p>Getting started? <a href="/International/getting-started/characters"><cite>Introducing Character Sets and Encodings</cite></a> <span class="uri">http://www.w3.org/International/getting-started/characters</span></p>
			</li>
			<li>
				<p>Tutorial, <a href="/International/tutorials/tutorial-char-enc/"><cite>Handling character encodings in HTML and CSS</cite></a> <span class="uri">http://www.w3.org/International/tutorials/tutorial-char-enc/</span></p>
			</li>
			<li>
				<p>Related links, <cite>Authoring HTML &amp; CSS</cite> – <a href="/International/techniques/authoring-html#charset">Characters</a> <span class="uri">http://www.w3.org/International/techniques/authoring-html#charset</span> – <a href="/International/techniques/authoring-html#bomhandling">Handling the byte-order mark</a> <span class="uri">http://www.w3.org/International/techniques/authoring-html#bomhandling</span></p>
			</li>
		</ul>
	</div>
	<?php echo $bottomOfPage; ?>

	</body>
</html>

