<?php 
// authors should fill in these assignments:
$directory = 'getting-started/'; // the directory path below /International up to but not including the file name: must end in a slash! 
$filename = 'characters'; // the file name WITHOUT extensions
$authors = 'Richard Ishida, W3C'; // author(s) and affiliations
$modifiers = ''; // people making substantive changes, and their affiliation
$searchString = 'gs-characters'; // blog search string - usually the filename without extensions
$firstPubDate = '2006-01-16'; // date of the first publication of the document (after review)
$lastSubstUpdate = '2009-05-01  9:44';  // date of last substantive changes to this document
$pathtophp = '../php'; // authors should check that the following points to /International/php - must be relative path

// authors AND translators should fill in these assignments:
$clang = 'en'; // the language extension for articles in this language (use 'en' for English)
$thisVersion = '2011-10-10  10:29'; // date of latest edits to this document/translation

// translators should fill in these assignments:
$translators = 'xxxNAME, ORG'; // translator(s) and their affiliation - a elements allowed, but use double quotes for attributes
$translatorContact=""; // please add email. This is not displayed, it allows the translation coordinator to contact you if needed in future.

include($pathtophp.'/bp3/boilerplate-'.$clang.'.php');

$breadcrumbs = <<<eot
<a href='/International/'>$s_home </a> &gt; <a href='/International/resources'>$s_resources</a> &gt; <a href='/International/articlelist#characters'>$s_articles</a>
eot;

$toc = <<<eot
<div id="toclocation"><!-- placeholder --></div>
eot;

$additionalLinks = <<<eot
eot;
include($pathtophp.'/bp3/structure.php');
?>
<!DOCTYPE html PUBLIC "-//W3C//DTD XHTML 1.0 Transitional//EN" "http://www.w3.org/TR/xhtml1/DTD/xhtml1-transitional.dtd">
<html <?php echo "lang='$clang' xml:lang='$clang'";?> xmlns="http://www.w3.org/1999/xhtml">
<head>
		<meta http-equiv="Content-Type" content="text/html; charset=utf-8" />
		<title>Introducing Character Sets and Encodings</title>
		<meta name="keywords"
		 content="i18n internationalisation internationalization localisation localization translation character sets, character encoding, charset, escapes, web addresses" />
		<meta name="description"
		 content="Pointers for the newcomer to useful introductory information on the W3C Internationalization subsite about character sets and encodings on the Web." />
<?php echo $headincludes;?>

<link rel="stylesheet" href="/International/style/article-standards-v2.css" type="text/css" />
<style type="text/css" media="all">
.sidenote { line-height: 110%; padding-bottom: 20px;  }
</style>
</head>

	<body>
		<span id="version-info" style="display: none;"><!-- #BeginDate format:IS1m -->2011-10-10  10:30<!-- #EndDate --></span> <?php echo $topOfPage; ?>

		<h1>Introducing Character Sets and Encodings</h1>

		<div class="section"><a id="contentstart" name="contentstart"></a> 
			<div id="audience"> 
				<p><?php echo $intendedAudience?> anyone who is new to internationalization and needs guidance on topics to consider and ways to get into the material on the site. </p>
			<?php echo $updated; ?>
			</div>
			
			<p>This page provides some orientation for newcomers to Web internationalization who don't really know where to start. The aim is to ease you gently into some of the material on the site.</p>
			<p>You can find a selection of more detailed articles using the links to the right. Once you get some ideas from this page, you will probably just use the <a href="/International/resource-index">topic index</a>, the <a href="/International/technique-index">techniques index</a>, or the site search.</p>
</div>
		<div class="section"> 

			<h2><a id="what" name="what" tabindex="1">What's it about?</a></h2>
			<div class="insidenote"> <strong>Learn more...</strong>
<p><a href="/International/questions/qa-what-is-encoding">Character encodings for beginners</a> explains some of the basic concepts about character encodings, and why you should care. </p>
				<p><a href="/International/articles/definitions-characters/">Essential definitions related to character encodings</a> provides explanations of terminology such as Unicode, character sets, coded character sets, character encodings, the document character set, and character escapes.</p>
				</div>
				<p>A character set is a collection of letters and symbols used in a writing system. For example, the ASCII character set covers letters and
				symbols for English text, ISO-8859-6 covers letters and symbols needed for many languages based on the Arabic script, and the Unicode character set
				contains characters for most of the living languages and scripts in the world.</p>
				<p>Characters in a character set are stored as one or more bytes in a computer. Each byte or sequence of bytes represents a given character.
					A character encoding is the key that maps a particular byte or sequence of bytes to particular characters that the font renders as text.</p>
				<p>There are many different character encodings. If the wrong encoding is applied to the bytes in memory, the result will be unintelligible
					text. It is therefore important, if people are to read your content, that you correctly label the character encoding used.</p>
		</div>
	<div class="section"> 

			<h2><a id="choosing" name="choosing">Choosing an encoding</a></h2>
<div class="sidenoteGroup"> 
			<p>Everyone developing content, whether content authors or programmers, must decide what character encoding to use. UTF-8 is a popular
				recommendation these days, but there may still be things you should consider before using it.</p>
			<div class="sidenote"> <strong>Learn more...</strong>
<p>HTML &amp; CSS authors<br />
			<a href="/International/techniques/authoring-html#choosing">Choosing a character encoding</a></p>
			<p>Spec developers<br />
			<a href="/International/techniques/developing-specs#choosing">Choosing  character encodings</a></p>
			<p>Server setup<br />
			<a href="/International/techniques/server-setup#choosing">Choosing a character encoding</a></p>
		</div>
		</div>
<br clear="all" />
		</div>
		<div class="section"> 

			<h2><a id="using" name="using">Declaring and applying an encoding</a></h2>
			<div class="sidenoteGroup">
<p>Once it has been decided what encoding to use, content developers and programmers must ensure that it is declared in the right way.</p>
			<p>With a technology such as XHTML, encoding declarations are not always straightforward; they require an understanding of <a href="/International/articles/serving-xhtml/">'standards' vs.
				'quirks' modes</a>, and the impact of the XML declaration. </p>
			<p>You must also ensure that your data is saved in the encoding you have chosen, it is not sufficient to just label it.</p>
			<p>Content developers and webmasters may also need to ensure that the <em>server</em> delivers content with the correct character encoding
				declarations, since server settings can override in-document declarations.</p>
				<div class="sidenote"> <strong>Learn more...</strong>
<p>HTML &amp; CSS authors<br />
					<a href="/International/techniques/authoring-html#indoc">Declaring the character encoding in an X/HTML document</a><br />
					<a href="/International/techniques/authoring-html#css">Declaring the character encoding in a CSS style sheet</a><br />
				<a href="/International/techniques/authoring-html#server">Declaring the character encoding on the server</a></p>
					<p>Spec developers<br />
					<a href="/International/techniques/developing-specs#identifying">Identifying character encodings</a></p>
					<p>Server setup<br />
					<a href="/International/techniques/server-setup#charset">Setting the HTTP charset parameter</a><br />
					<a href="/International/techniques/server-setup#charset">Setting character encoding information using .htaccess</a></p>
				</div>
			</div>
</div>
		<div class="section"> 

			<h2><a id="escapes" name="escapes">Escapes</a></h2>
			<div class="sidenoteGroup">
				<p><span class="newterm">Escapes</span> are a way of representing a character using only ASCII text. They provide a way of representing
				characters that are not available in the character encoding you are using, or a way of avoiding the use of the character for other reasons (such as
				when they may conflict with syntax). You should be clear on when and how these escapes should be used.</p>
				<div class="sidenote"> <strong>Learn more...</strong>
					<p>HTML &amp; CSS authors<br />
					<a href="/International/techniques/authoring-html#escapes">Using escapes to represent characters</a></p>
					<p>SVG authors<br />
						<a href="/International/techniques/authoring-svg#escapes">Using escapes to represent characters</a></p>
					<p>XML authors<br />
						<a href="/International/techniques/authoring-xml#escapes">Using escapes to represent characters</a></p>
					<p>Spec developers<br />
					<a href="/International/techniques/developing-specs#escapes">Designing character escapes</a></p>
				</div>
<br clear="all" style="clear:both;" />
			</div>
	</div>
		<div class="section"> 

			<h2><a id="address" name="address">Web addresses</a></h2>
			<div class="sidenoteGroup">
				<p>These days web addresses can also include non-ASCII characters. The user does little other than click on the appropriate link or enter
				the text as they see it, the heavy lifting is done by the user agent, but you may be interested to know how this works.</p>
				<p>Specification developers should design their specifications so that non-ASCII web addresses can be used.</p>
				<div class="sidenote"> <strong>Learn more...</strong>
					<p>HTML &amp; CSS authors<br />
					<a href="/International/techniques/authoring-html#iris">Using non-ASCII web addresses</a></p>
					<p>Spec developers<br />
					<a href="/International/techniques/developing-specs#newsyntax">Defining protocol or format elements to be interpreted as URIs</a><br /><a href="/International/techniques/developing-specs#defining">Defining a new syntax for URIs</a></p>
				</div>
			</div>
<br clear="all" />
	</div>
<?php echo $bottomOfPage; ?>

	</body>
</html>

