<?php 
// AUTHORS should fill in these assignments:
$directory = 'questions/'; // the directory path below /International up to but not including the file name: must end in a slash! 
$filename = 'qa-html-css-normalization'; // the file name WITHOUT extensions
$authors = 'Richard Ishida, W3C'; // author(s) and affiliations
$modifiers = ''; // people making substantive changes, and their affiliation
$searchString = 'qa-html-css-normalization'; // blog search string - usually the filename without extensions
$firstPubDate = '2010-08-10  14:48'; // date of the first publication of the document (after review)
$lastSubstUpdate = '2010-08-10  14:48';  // date of latest substantive changes to this document
$pathtophp = '../php'; // authors should check that the following points to /International/php - must be relative path

// AUTHORS AND TRANSLATORS should fill in these assignments:
$clang = 'en'; // the language extension for articles in this language (use 'en' for English)
$isTranslation = 'no';  // set to 'yes' if this is a translation !
$copyrightYear = '2010'; // this year, but may also be a range, eg. 2002-2006
$thisVersion = '2010-09-09  13:40'; // date of latest edits to this document/translation

// TRANSLATORS should fill in these assignments:
$translators = 'xxxNAME, ORG'; // translator(s) and their affiliation - a elements allowed, but use double quotes for attributes
$enVersion = 'xxxYYYY-MM-DD';  // date of the English original on which the translation is based (see last substantive change date at bottom of file)

include($pathtophp.'/bp3/boilerplate-'.$clang.'.php');

if (! isset($s_articles)) { $s_articles = "Articles"; }
$breadcrumbs = <<<eot
<a href='/International/'>$s_home </a> &gt; <a href='/International/resources'>$s_resources</a> &gt; <a href='/International/articlelist#characters'>$s_articles</a>
eot;

$toc = <<<eot
<ol>
<li><a href="#question">$s_questionLink</a></li>
<li><a href="#answer">$s_answerLink</a>
	<ol>
	<li><a href="#n11nwhat">What are normalization forms?</a></li>
	<li><a href="#n11nhow">What do I need to know about normalization?</a></li>
	<li><a href="#checking">How can I check pages for problems?</a></li>
	</ol></li>
<li><a href="#endlinks">$s_furtherReadingLink</a></li>
</ol>
eot;

$additionalLinks = <<<eot
<h2>Quick check</h2>
<form action="http://qa-dev.w3.org/i18n-checker/index" method="get" class="quickcheck"><p>Check for normalization mismatches in id and class names</p><p><input type="text" value="URI of page to check" name="docAddr" onfocus="this.value=''" /></p><p><button type="submit">Check</button></p><p><span class="guide">Look for the "Class &amp; id names" field in the Information table.</span></p></form>
eot;
include($pathtophp.'/bp3/structure.php');
?>
<!DOCTYPE html PUBLIC "-//W3C//DTD XHTML 1.0 Transitional//EN" "http://www.w3.org/TR/xhtml1/DTD/xhtml1-transitional.dtd">
<html <?php echo "lang='$clang' xml:lang='$clang'";?> xmlns="http://www.w3.org/1999/xhtml">
<head>
		<meta http-equiv="Content-Type" content="text/html; charset=utf-8" />
		<title>Normalization in HTML and CSS</title>
		<meta name="keywords"
		 content="i18n internationalisation internationalization localisation localization translation character encoding unicode utf-8 normalization NFC normalization form precomposed decomposed" />
		<meta name="description" content="What are normalization forms, and why do I need to know about them when creating HTML and CSS content?" />
<?php echo $headincludes;?>
<link rel="stylesheet" href="/International/style/article-standards-v2.css" type="text/css" />
</head>

	<body>
		<span id="version-info" style="display: none;"><!-- #BeginDate format:IS1m -->2010-09-09  14:00<!-- #EndDate --></span> <?php echo $topOfPage; ?>

		<h1>Normalization in HTML and CSS</h1>

		<div class="section"><a id="contentstart" name="contentstart"></a> 
			<div id="audience"> 
				<p><?php echo $intendedAudience?>  XHTML/HTML coders (using editors or scripting), script developers (PHP, JSP, etc.), CSS coders,
					 Web project managers, and anyone who  is unfamiliar with Unicode normalization and how it can affect the success of HTML and CSS authoring.</p>
			<?php echo $updated; ?>
			</div>


			<h2><?php echo $questionHead?></h2>
			<div class="section2"> 
				<p class="question">What are normalization forms, and why do I need to know about them when creating HTML and CSS content?</p>
			</div>
		</div>
		
	<div class="section"> 

		<h2><?php echo $answerHead?></h2>
		<p>Normalization is something you need to be aware of if you are authoring HTML pages with CSS style sheets in UTF-8 (or any other Unicode encoding), particularly if you are dealing with text in a  script that uses accents or other diacritics. </p>
		<div class="section2">
			<h3><a id="n11nwhat" name="n11nwhat" href="#n11nwhat"> What are normalization forms?</a></h3>
			<p>In Unicode it is possible to produce the same text with different sequences of characters. For example, take the Hungarian word <span class="qterm">világ</span>. The fourth letter could be stored in memory as a <dfn>precomposed</dfn> U+00E1   LATIN SMALL LETTER A WITH ACUTE (a single character) or as a <dfn>decomposed</dfn> sequence of U+0061   LATIN SMALL LETTER A followed by U+0301   COMBINING ACUTE ACCENT (two characters). </p>
			<p><img src="images/vilag.png" alt=" " /></p>
			<p>The Unicode Standard allows either of these alternatives, but requires that both  be treated as identical. To improve efficiency, an application will usually <dfn>normalize</dfn> text before performing searches or comparisons. Normalization, in this case, means converting the text to use all precomposed or all decomposed characters.</p>
			<p>There are four <dfn>normalization forms</dfn> specified by the Unicode Standard: NFC, NFD, NFKC and NFKD. The <span class="qchar">C</span> stands for (pre-)composed, and the <span class="qchar">D</span> for decomposed. The <span class="qchar">K</span> stands for compatibility. To improve interoperability, the W3C recommends the use of <strong>NFC</strong> normalized text on the Web.</p>
		</div>
		<div class="section2">
			<h3><a id="n11nhow" name="n11nhow" href="#n11nhow"> What do I need to know about normalization?</a></h3>
			<p style="">Unfortunately, normalization doesn't always take place before content is compared. A particularly important case is the use of selectors and class names or ids in HTML and CSS. If the word <span class="qterm">világ</span> is used in precomposed form in the HTML (eg. <code>&lt;span class=&quot;világ&quot;&gt;</code>), but in decomposed form in the CSS (eg. <code>.vila&#x0301;g { font-style: italic; }</code>), then the selector won't match the class name.</p>
			<p style="">What this means is that when producing content you should ensure that selectors and class or id names are character-for-character the same. This is particularly likely to be a  issue if the markup and the CSS are being authored or maintained by different people.</p>
			<p style="">The best way to ensure that these match is to use one particular Unicode normalization form for all authored content. As we said above, the W3C recommends NFC.</p>
			<p style="">Most keyboards for European languages output text in NFC already, but this is less likely to be the case if dealing with many non-European languages.<a href="/International/questions/qa-utf8-bom"></a></p>
			<p style="">In some cases your editor may allow you to save data in a choice of normalization forms. The picture below shows  an option for setting a particular normalization form as the default when opening new files in DreamWeaver (NFC is selected). You are shown a similar choice when saving a document.</p>
			<p style=""><img src="images/dwprefs-nfc.png" alt="Unicode normalization form preferences on a dialog panel, showing NFC selected." /></p>
		</div>
		<div class="section2">
			<h3><a id="checking" name="checking" href="#checking"> How can I check pages for problems?</a></h3>
			<p style="">You can find out whether an HTML page contains  class names and id values that are not normalized according to NFC by using the <a href="http://qa-dev.w3.org/i18n-checker/">W3C Internationalization Checker</a>.</p>
			<p style="">If you do have problems, you should find an editor or conversion tool that allows you to specify the normalization form, and use that to re-save your page.</p>
		</div>
	</div>
	<?php echo $survey;?>
	<div class="section noprint">
		<h2><?php echo $readingHead?></h2>
		<ul id="full-links">
			<li>
				<p>Getting started?  <a href="/International/getting-started/characters"><cite>Introducing Character Sets and Encodings</cite></a> <span class="uri">http://www.w3.org/International/getting-started/characters</span></p>
			</li>
			<li>
				<p>Tutorial,  <a href="/International/tutorials/tutorial-char-enc/"><cite>Handling character encodings in HTML and CSS</cite></a> <span class="uri">http://www.w3.org/International/tutorials/tutorial-char-enc/</span></p>
			</li>
			<li>
				<p>Related links, <cite>Authoring HTML &amp; CSS</cite> – <a href="/International/techniques/authoring-html#charset">Characters</a> <span class="uri">http://www.w3.org/International/techniques/authoring-html#charset</span> – <a href="/International/techniques/authoring-html#normalization">Handling normalization</a> <span class="uri">http://www.w3.org/International/techniques/authoring-html#normalization</span></p>
			</li>
		</ul>
	</div>
	<?php echo $bottomOfPage; ?>

	</body>
</html>

