- From: poot <cvsmail@w3.org>
- Date: Thu, 17 Mar 2011 16:11:35 -0400
- To: public-html-diffs@w3.org
eliot: Edited section 3 per bug 12062, comments 13-14;
http://dev.w3.org/cvsweb/html5/html-xhtml-author-guide/html-xhtml-authoring-guide.html?r1=1.68&r2=1.69&f=h
===================================================================
RCS file: /sources/public/html5/html-xhtml-author-guide/html-xhtml-authoring-guide.html,v
retrieving revision 1.68
retrieving revision 1.69
diff -u -d -r1.68 -r1.69
--- html-xhtml-authoring-guide.html 14 Mar 2011 21:03:45 -0000 1.68
+++ html-xhtml-authoring-guide.html 17 Mar 2011 20:09:55 -0000 1.69
@@ -14,7 +14,7 @@
<a href="http://www.w3.org/"><img height="48" width="72" alt="W3C" src="http://www.w3.org/Icons/w3c_home"/></a>
</p>
<h1 class="title" id="title">Polyglot Markup: HTML-Compatible XHTML Documents</h1>
- <h2 id="w3c-editor-s-draft-05-january-2011">W3C Editor's Draft 4 March 2011</h2>
+ <h2 id="w3c-editor-s-draft-17-march-2011">W3C Editor's Draft 17 March 2011</h2>
<dl>
<dt>This version:</dt>
<dd><a href="http://dev.w3.org/html5/html-xhtml-author-guide/html-xhtml-authoring-guide.html">http://dev.w3.org/html5/html-xhtml-author-guide/html-xhtml-authoring-guide.html</a></dd>
@@ -243,38 +243,32 @@
<div id="character-encoding" class="section">
<!--OddPage--><h2><span class="secno">3. </span>Specifying a Document's Character Encoding</h2>
<p>
- <a class="internalDFN" href="#dfn-polyglot-markup" title="polyglot markup">Polyglot markup</a> declares character encoding in the following ways, which may be used separately or in combination
- (if used in combination, each approach contains identical encoding information):
+ Polyglot markup uses the UTF-8 character encoding, the only character encoding for which both HTML and XML require support.
+ HTML requires UTF-8 to be explicitly declared to avoid <a href="http://www.w3.org/TR/html5/semantics.html#charset">fallback to a legacy encoding</a> [<cite><a href="#bib-HTML5" rel="biblioentry" class="bibref">HTML5</a></cite>].
+ For XML, UTF-8 is an <a href="http://www.w3.org/TR/2008/REC-xml-20081126/#charencoding">encoding default</a>.
+ As such, character encoding <em title="may" class="rfc2119">may</em> be left undeclared in XML with the result that UTF-8 is still supported [<cite><a href="#bib-XML10" rel="biblioentry" class="bibref">XML10</a></cite>].
+ </p>
+ <p>
+ <a class="internalDFN" href="#dfn-polyglot-markup" title="polyglot markup">Polyglot markup</a> declares the UTF-8 character encoding in the following ways, which may be used separately or in combination:
</p><ul>
<li>Within the document</li>
<ul>
<li>By using the Byte Order Mark (BOM) character (preferred).</li>
- <li>By relying on UTF-8 as the encoding default of XML, used in combination with the HTML <code><meta charset="UTF-8"/></code> element.</li>
+ <li>By using <code><meta charset="UTF-8"/></code> (the HTML encoding declaration).</li>
</ul>
- <li>In the HTTP header of the response [<cite><a href="#bib-HTTP11" rel="biblioentry" class="bibref">HTTP11</a></cite>], as in the following:
- <p>
- <code>Content-type: text/html; charset=utf-8</code>
- </p>
- Note that <a class="internalDFN" href="#dfn-polyglot-markup">polyglot markup</a> may use either <code>text/html</code> or <code>application/xhtml+xml</code> for the value of the content type.
+ <li>Outside the document
+ <ul>
+ <li>By adding <code>"charset=utf-8"</code> to the MIME/HTTP Content-Type header [<cite><a href="#bib-HTTP11" rel="biblioentry" class="bibref">HTTP11</a></cite>], as the following examples show in HTML and XML, respectively: </li>
+ </ul>
+ <pre class="example"><code>Content-type: text/html; charset=utf-8</code></pre>
+ <pre class="example"><code>Content-type: application/xhtml+xml; charset=utf-8</code></pre>
</li>
</ul>
<p></p>
- <p>
- Using <code><meta charset="*"/></code> has no effect in XML.
- Therefore, <a class="internalDFN" href="#dfn-polyglot-markup">polyglot markup</a> may use <code><meta charset="*"/></code> provided the document is encoded as UTF-8 and the value of charset is a case-insensitive match for the string "utf-8".
- </p>
- <p>
- <a class="internalDFN" href="#dfn-polyglot-markup" title="polyglot markup">Polyglot markup</a> uses UTF-8 encoding.
- The BOM character <em title="may" class="rfc2119">may</em> be used with the UTF-8 encoding (see <a href="http://dev.w3.org/html5/spec/syntax.html#writing">Writing HTML documents</a> in [<cite><a href="#bib-HTML5" rel="biblioentry" class="bibref">HTML5</a></cite>]),
-
- and using the BOM character is preferred to not using the BOM
-character.
- Because the construct of the BOM character is the same for XML and
-HTML (unlike the encoding declaration inside the HTTP Content-Type
-header),
- and because the BOM character works in both XML and HTML (unlike the <code><meta charset="UTF-8"/></code> declaration of HTML and
- the UTF-8 encoding default of XML),
- the BOM character can be said to be the most polyglot encoding declaration.
+ <p class="note">
+ The HTML encoding declaration has no effect in XML.
+ When the HTML encoding declaration is the only encoding declaration,
+the encoding default from XML makes XML parsers treat content as UTF-8.
</p>
<p>
The <a href="http://www.w3.org/International/questions/qa-html-encoding-declarations">W3C Internationalization (i18n) Group recommends</a>
@@ -1052,6 +1046,8 @@
</div>
+
+
<!-- Appendix -->
<div id="acknowledgements" class="appendix section">
<h2><span class="secno">A. </span>Acknowledgements</h2>
Received on Thursday, 17 March 2011 20:11:37 UTC