<?php 
// authors should fill in these assignments:
$directory = 'questions/'; // the directory path below /International up to but not including the file name: must end in a slash! 
$filename = 'qa-utf8-bom'; // the file name WITHOUT extensions
$topicIndex[] = 'characters'; // anchor of appropriate place in /International/articlelist
$techIndex[] = 'authoring-html#charset'; // path after /International/techniques/ to the appropriate place in a techniques page
$authors = 'Deborah Cawkwell, BBC World Service'; // author(s) and affiliations
$modifiers = 'Richard Ishida, W3C'; // people making substantive changes, and their affiliation
$searchString = 'qa-utf8-bom'; // blog search string - usually the filename without extensions
$firstPubDate = '2003-11-27'; // date of the first publication of the document (after review)
$lastSubstUpdate = '2007-07-17 18:19';  // date of last substantive changes to this document
$pathtophp = '../php'; // authors should check that the following points to /International/php - must be relative path

// authors AND translators should fill in these assignments:
$clang = 'en'; // the language extension for articles in this language (use 'en' for English)
$isTranslation = 'no';  // set to 'yes' if this is a translation !
$copyrightYear = '2003-2007'; // this year, but may also be a range, eg. 2002-2006
$thisVersion = '2007-07-17 18:19'; // date of latest edits to this document/translation

// translators should fill in these assignments:
$translators = 'xxxNAME, ORG'; // translator(s) and their affiliation - a elements allowed, but use double quotes for attributes
$enVersion = 'xxxYYYY-MM-DD';  // date of the English original on which the translation is based (see last substantive change date at bottom of file)

$additionalLinks = '';
include($pathtophp.'/bp2/structure.php');
?>
<!DOCTYPE html PUBLIC "-//W3C//DTD XHTML 1.0 Transitional//EN" "http://www.w3.org/TR/xhtml1/DTD/xhtml1-transitional.dtd">
<html lang="<?php echo $clang;?>" xml:lang="<?php echo $clang;?>" xmlns="http://www.w3.org/1999/xhtml">
<head>
		<meta http-equiv="Content-Type" content="text/html; charset=utf-8" />
		<title>W3C I18N FAQ: Display problems caused by the UTF-8 BOM</title>
		<meta name="keywords"
		 content="i18n internationalisation internationalization localisation localization translation utf-8 BOM byte order mark unexpected characters blank lines ï»¿" />
		<meta name="description"
		 content="W3C I18N FAQ: When using UTF-8 encoded pages in some user agents, I get an extra line or unwanted characters at the top of my web page or included file. How do I remove them?" />
<?php echo $headincludes;?>
<style type="text/css" media="all">
</style>
</head>

	<body bgcolor="white">
		<span id="version-info" style="display: none;"><!-- #BeginDate format:IS1m -->2009-03-05  13:02<!-- #EndDate --></span> <?php echo $topOfPage; ?>

		<h1>Display problems caused by the UTF-8 BOM</h1>
		<div id="navigation"> 
			<p><?php echo $onthispage?><?php echo $questionLink?>&nbsp;- <?php echo $backgroundLink?>&nbsp;- <?php echo $answerLink?>&nbsp;- <?php echo $btwLink?>&nbsp;-
				<?php echo $readingLink?></p>
		</div>
		<div class="section"><a id="contentstart" name="contentstart" tabindex="1"></a> 
			<div id="audience"> 
				<p><?php echo $intendedAudience?> XHTML/HTML coders (using editors or scripting), script developers (PHP, JSP, etc.), CSS coders, XSLT
					developers, Web project managers, and anyone who is trying to diagnose why blank lines or other strange items are displayed on their UTF-8 page. </p>
			</div>

			<h2><?php echo $questionHead?></h2>
			<div class="section2"> 
				<p class="question">When using UTF-8 encoded pages in some user agents, I get an extra line or unwanted characters at the top of my web
					page or included file. How do I remove them?</p>
			</div>
		</div>
		<div class="section"> 

			<h2><?php echo $answerHead?></h2>
			<p>If you are dealing with a file encoded in UTF-8, your display problems may be caused by the presence of a UTF-8 signature (BOM) that the
				user agent doesn't recognize.</p>
			<p>The BOM is always at the beginning of the file, and so you would normally expect to see the display issues at the top of a page. However,
				you may also find blank lines appearing within the page if you include text from a separate file that begins with a UTF-8 signature.</p>
			<p>We have a set of <a href="http://www.w3.org/International/tests/sec-utf8-signature-0">test pages</a> and a
				<a href="http://www.w3.org/International/tests/results/results-utf8-signature">summary of results</a> for various recent browser versions that
				explore this behaviour.</p>
			<p>This article will help you determine whether the UTF-8 is causing the problem. If there is no evidence of a UTF-8 signature at the
				beginning of the file, then you will have to look elsewhere for a solution.</p>
			<div class="section2"> 

				<h3><a id="bom" name="bom" href="#bom">What is a UTF-8 signature (BOM)?</a></h3>
				<div class="sidenoteGroup"> 
					<p>Some applications insert a particular combination of bytes at the beginning of a file to indicate that the text contained in the
						file is Unicode. This combination of bytes is known as a <strong>signature</strong> or <strong>Byte Order Mark (BOM)</strong>. Some applications -
						such as a text editor or a browser - will display the BOM as an extra line in the file, others will display unexpected characters, such as ï»¿.</p>
					<p>See the side panel for more detailed information about the BOM.</p>
					<div class="sidenote"> 
						<p>The BOM is the Unicode codepoint U+FEFF, corresponding to the Unicode character 'ZERO WIDTH NON-BREAKING SPACE' (ZWNBSP).</p>
						<p>In UTF-16 and UTF-32 encodings, unless there is some alternative indicator, the BOM is essential to ensure correct
							interpretation of the file's contents. Each character in the file is represented by 2 or 4 bytes of data and the order in which these bytes are
							stored in the file is significant; the BOM indicates this order.</p>
						<p>In the UTF-8 encoding, the presence of the BOM is not essential because, unlike the UTF-16 or UTF-32 encodings, there is no
							alternative sequence of bytes in a character. The BOM may still occur in UTF-8 encoding text, however, either as a by-product of an encoding
							conversion or because it was added by an editor.</p>
					</div>
				</div>
			</div>
			<div class="section2"> 

				<h3><a id="detect" name="detect" href="#detect">Detecting the BOM</a></h3>
				<p>First, we need to check whether there is indeed a BOM at the beginning of the file.</p>
				<p>You can try looking for a BOM in your content, but if your editor handles the UTF-8 signature correctly you probably won't be able to
					see it. An editor which does not handle the UTF-8 signature correctly displays the bytes that compose that signature according to its own character
					encoding setting. (With the Latin 1 (ISO 8859-1) character encoding, the signature displays as characters ï»¿.) With a binary editor capable of
					displaying the hexadecimal byte values in the file, the UTF-8 signature displays as EF BB BF.</p>
				<p>Alternatively, your editor may tell you in a status bar or a menu what encoding your file is in, including information about the
					presence or not of the UTF-8 signature.</p>
				<p>If not, some kind of script-based test (see below) may help. Alternatively, you could try this small
					<a href="http://people.w3.org/rishida/utils/bomtester/">web-based utility</a>. (Note, if it’s a file included by PHP or some other mechanism that you
					think is causing the problem, type in the URI of the <em>included</em> file.)</p>
			</div>
			<div class="section2"> 

				<h3><a id="remove" name="remove" href="#remove">Removing the BOM</a></h3>
				<p>If you have an editor which shows the characters that make up the UTF-8 signature you may be able to delete them by hand. Chances are,
					however, that the BOM is there in the first place because you didn't see it.</p>
				<p>Check whether your editor allows you to specify whether a UTF-8 signature is added or kept
					<a href="http://www.w3.org/International/questions/qa-setting-encoding-in-applications">during a save</a>. Such an editor provides a way of removing
					the signature by simply reading the file in then saving it out again. For example, if Dreamweaver detects a BOM the Save As dialogue box will have a
					check mark alongside the text "Include Unicode Signature (BOM)". Just uncheck the box and save.</p>
				<p>One of the benefits of using a script is that you can remove the signature quickly, and from multiple files. In fact the script could
					be run automatically as part of your process. If you use Perl, you could use <a href="http://people.w3.org/rishida/blog/?p=102">a simple script</a>
					created by Martin Dürst.</p>
				<p>Note: You should check the process impact of removing the signature. It may be that some part of your content development process
					relies on the use of the signature to indicate that a file is in UTF-8. Bear in mind also that pages with a high proportion of Latin characters may
					look correct superficially but that occasional characters outside the ASCII range (U+0000 to U+007F) may be incorrectly encoded.</p>
			</div>
		</div>
		<div class="section"> 

			<h2><?php echo $btwHead?></h2>
			<p>You will find that some text editors such as Windows Notepad will automatically add a UTF-8 signature to any file you save as UTF-8.</p>
			<p>A UTF-8 signature at the beginning of a CSS file can sometimes cause the initial rules in the file to fail on certain user agents.</p>
			<p>In some browsers, the presence of a UTF-8 signature will cause the browser to interpret the text as UTF-8 regardless of any character
				encoding declarations to the contrary.</p>
		</div>
<?php echo $survey;?>
		<div class="section noprint"> 

			<h2><?php echo $readingHead?></h2>
			<ul id="full-links">
				<li> 
					<p><a href="http://www.unicode.org/unicode/faq/utf_bom.html">Unicode FAQ about the Byte Order Mark</a> <span
						class="uri">http://www.unicode.org/unicode/faq/utf_bom.html</span></p>
				</li>
				<li> 
					<p><a href="http://msdn.microsoft.com/library/default.asp?url=/library/en-us/intl/unicode_42jv.asp">Microsoft documentation about the
						Byte Order Mark</a> <span class="uri">http://msdn.microsoft.com/library/default.asp?url=/library/en-us/intl/unicode_42jv.asp</span></p>
				</li>
				<li> 
					<p><a href="/International/questions/qa-setting-encoding-in-applications">Setting encoding in web authoring applications</a>
						<span class="uri">http://www.w3.org/International/questions/qa-setting-encoding-in-applications</span></p>
				</li>
				<li> 
					<p><a href="/International/tests/sec-utf8-signature-0">Test page for UTF-8 signature effects</a> <span
						class="uri">http://www.w3.org/International/tests/sec-utf8-signature-0</span></p>
				</li>
			</ul>
		</div>
<?php echo $bottomOfPage; ?>

	</body>
</html>

