html5/spec Overview.html,1.2941,1.2942

Update of /sources/public/html5/spec
In directory hutz:/tmp/cvs-serv30837

Modified Files:
	Overview.html 
Log Message:
Move the character encoding stuff down to the HTML syntax section since we don't want to override XML here. (whatwg r3772)

Index: Overview.html
===================================================================
RCS file: /sources/public/html5/spec/Overview.html,v
retrieving revision 1.2941
retrieving revision 1.2942
diff -u -d -r1.2941 -r1.2942
--- Overview.html	9 Sep 2009 05:27:43 -0000	1.2941
+++ Overview.html	9 Sep 2009 06:43:02 -0000	1.2942
@@ -363,24 +363,23 @@
      <li><a href="#concept-http-equivalent"><span class="secno">2.6.1 </span>Protocol concepts</a></li>
      <li><a href="#encrypted-http-and-related-security-concerns"><span class="secno">2.6.2 </span>Encrypted HTTP and related security concerns</a></li>
      <li><a href="#content-type-sniffing"><span class="secno">2.6.3 </span>Determining the type of a resource</a></ol></li>
-   <li><a href="#character-encodings-0"><span class="secno">2.7 </span>Character encodings</a></li>
-   <li><a href="#common-dom-interfaces"><span class="secno">2.8 </span>Common DOM interfaces</a>
+   <li><a href="#common-dom-interfaces"><span class="secno">2.7 </span>Common DOM interfaces</a>
     <ol>
-     <li><a href="#reflecting-content-attributes-in-idl-attributes"><span class="secno">2.8.1 </span>Reflecting content attributes in IDL attributes</a></li>
-     <li><a href="#collections-0"><span class="secno">2.8.2 </span>Collections</a>
+     <li><a href="#reflecting-content-attributes-in-idl-attributes"><span class="secno">2.7.1 </span>Reflecting content attributes in IDL attributes</a></li>
+     <li><a href="#collections-0"><span class="secno">2.7.2 </span>Collections</a>
       <ol>
-       <li><a href="#htmlcollection-0"><span class="secno">2.8.2.1 </span>HTMLCollection</a></li>
-       <li><a href="#htmlallcollection-0"><span class="secno">2.8.2.2 </span>HTMLAllCollection</a></li>
-       <li><a href="#htmlformcontrolscollection-0"><span class="secno">2.8.2.3 </span>HTMLFormControlsCollection</a></li>
-       <li><a href="#htmloptionscollection-0"><span class="secno">2.8.2.4 </span>HTMLOptionsCollection</a></li>
-       <li><a href="#htmlpropertycollection-0"><span class="secno">2.8.2.5 </span>HTMLPropertyCollection</a></ol></li>
-     <li><a href="#domtokenlist-0"><span class="secno">2.8.3 </span>DOMTokenList</a></li>
-     <li><a href="#domsettabletokenlist-0"><span class="secno">2.8.4 </span>DOMSettableTokenList</a></li>
-     <li><a href="#safe-passing-of-structured-data"><span class="secno">2.8.5 </span>Safe passing of structured data</a></li>
-     <li><a href="#domstringmap-0"><span class="secno">2.8.6 </span>DOMStringMap</a></li>
-     <li><a href="#dom-feature-strings"><span class="secno">2.8.7 </span>DOM feature strings</a></li>
-     <li><a href="#exceptions"><span class="secno">2.8.8 </span>Exceptions</a></li>
-     <li><a href="#garbage-collection"><span class="secno">2.8.9 </span>Garbage collection</a></ol></ol></li>
+       <li><a href="#htmlcollection-0"><span class="secno">2.7.2.1 </span>HTMLCollection</a></li>
+       <li><a href="#htmlallcollection-0"><span class="secno">2.7.2.2 </span>HTMLAllCollection</a></li>
+       <li><a href="#htmlformcontrolscollection-0"><span class="secno">2.7.2.3 </span>HTMLFormControlsCollection</a></li>
+       <li><a href="#htmloptionscollection-0"><span class="secno">2.7.2.4 </span>HTMLOptionsCollection</a></li>
+       <li><a href="#htmlpropertycollection-0"><span class="secno">2.7.2.5 </span>HTMLPropertyCollection</a></ol></li>
+     <li><a href="#domtokenlist-0"><span class="secno">2.7.3 </span>DOMTokenList</a></li>
+     <li><a href="#domsettabletokenlist-0"><span class="secno">2.7.4 </span>DOMSettableTokenList</a></li>
+     <li><a href="#safe-passing-of-structured-data"><span class="secno">2.7.5 </span>Safe passing of structured data</a></li>
+     <li><a href="#domstringmap-0"><span class="secno">2.7.6 </span>DOMStringMap</a></li>
+     <li><a href="#dom-feature-strings"><span class="secno">2.7.7 </span>DOM feature strings</a></li>
+     <li><a href="#exceptions"><span class="secno">2.7.8 </span>Exceptions</a></li>
+     <li><a href="#garbage-collection"><span class="secno">2.7.9 </span>Garbage collection</a></ol></ol></li>
  <li><a href="#dom"><span class="secno">3 </span>Semantics, structure, and APIs of HTML documents</a>
   <ol>
    <li><a href="#documents"><span class="secno">3.1 </span>Documents</a>
@@ -991,8 +990,9 @@
      <li><a href="#the-input-stream"><span class="secno">9.2.2 </span>The input stream</a>
       <ol>
        <li><a href="#determining-the-character-encoding"><span class="secno">9.2.2.1 </span>Determining the character encoding</a></li>
-       <li><a href="#preprocessing-the-input-stream"><span class="secno">9.2.2.2 </span>Preprocessing the input stream</a></li>
-       <li><a href="#changing-the-encoding-while-parsing"><span class="secno">9.2.2.3 </span>Changing the encoding while parsing</a></ol></li>
+       <li><a href="#character-encodings-0"><span class="secno">9.2.2.2 </span>Character encodings</a></li>
+       <li><a href="#preprocessing-the-input-stream"><span class="secno">9.2.2.3 </span>Preprocessing the input stream</a></li>
+       <li><a href="#changing-the-encoding-while-parsing"><span class="secno">9.2.2.4 </span>Changing the encoding while parsing</a></ol></li>
      <li><a href="#parse-state"><span class="secno">9.2.3 </span>Parse state</a>
       <ol>
        <li><a href="#the-insertion-mode"><span class="secno">9.2.3.1 </span>The insertion mode</a></li>
@@ -4646,111 +4646,7 @@
   occur. For more details, see the Content-Type Processing Model
   specification. <a href="#refsMIMESNIFF">[MIMESNIFF]</a></p>
 
-  </div><div class="impl">
-
-  <h3 id="character-encodings-0"><span class="secno">2.7 </span>Character encodings</h3><p class="XXX annotation"><b>Status: </b><i>Working draft</i></p>
-
-  <p>User agents must at a minimum support the UTF-8 and Windows-1252
-  encodings, but may support more.</p>
-
-  <p class="note">It is not unusual for Web browsers to support dozens
-  if not upwards of a hundred distinct character encodings.</p>
-
-  <p>User agents must support the preferred MIME name of every
-  character encoding they support that has a preferred MIME name, and
-  should support all the IANA-registered aliases of every character
-  encoding they support. <a href="#refsIANACHARSET">[IANACHARSET]</a></p>
-
-  <p>When comparing a string specifying a character encoding with the
-  name or alias of a character encoding to determine if they are
-  equal, user agents must remove any leading or trailing <a href="#space-character" title="space character">space characters</a> in both names, and
-  then perform the comparison in an <a href="#ascii-case-insensitive">ASCII
-  case-insensitive</a> manner.</p>
-
-<!-- this bit will be replaced by actual alias registrations in due course -->
-
-  <p>In addition, user agents must support the aliases given in the
-  following table for every character encoding they support, so that
-  labels from the first column are treated as equivalent to the labels
-  given in the corresponding cell from the second column on the same
-  row.</p>
-
-  <table><caption>Additional character encoding aliases</caption>
-   <thead><tr><th> Alias <th> Corresponding encoding <th> References
-   <tbody><tr><td> x-sjis <td> windows-31J <td>
-         <a href="#refsSHIFTJIS">[SHIFTJIS]</a>
-         <a href="#refsWIN31J">[WIN31J]</a>
-    <tr><td> windows-932 <td> windows-31J <td>
-         <a href="#refsWIN31J">[WIN31J]</a>
-    <tr><td> x-x-big5 <td> Big5 <td>
-         <a href="#refsBIG5">[BIG5]</a>
-   </table><!-- end of bit that will be replaced by actual alias registrations in due course --><hr><p>When a user agent would otherwise use an encoding given in the
-  first column of the following table to either convert content to
-  Unicode characters or convert Unicode characters to bytes, it must
-  instead use the encoding given in the cell in the second column of
-  the same row. When a byte or sequence of bytes is treated
-  differently due to this encoding aliasing, it is said to have been
-  <dfn id="misinterpreted-for-compatibility">misinterpreted for compatibility</dfn>.</p>
-
-  <table><caption>Character encoding overrides</caption>
-   <thead><tr><th> Input encoding <th> Replacement encoding <th> References
-   <tbody><!-- how about EUC-JP? --><tr><td> EUC-KR <td> windows-949 <td>
-         <a href="#refsEUCKR">[EUCKR]</a>
-         <a href="#refsWIN949">[WIN949]</a>
-    <tr><td> GB2312 <td> GBK <td>
-         <a href="#refsRFC1345">[RFC1345]</a>
-         <a href="#refsGBK">[GBK]</a>
-    <tr><td> GB_2312-80 <td> GBK <td>
-         <a href="#refsRFC1345">[RFC1345]</a>
-         <a href="#refsGBK">[GBK]</a>
-    <tr><td> ISO-8859-1 <td> windows-1252 <td>
-         <a href="#refsRFC1345">[RFC1345]</a>
-         <a href="#refsWIN1252">[WIN1252]</a>
-    <tr><td> ISO-8859-9 <td> windows-1254 <td>
-         <a href="#refsRFC1345">[RFC1345]</a>
-         <a href="#refsWIN1254">[WIN1254]</a>
-    <tr><td> ISO-8859-11 <td> windows-874 <td>
-         <a href="#refsISO885911">[ISO885911]</a>
-         <a href="#refsWIN874">[WIN874]</a>
-    <tr><td> KS_C_5601-1987 <td> windows-949 <td>
-         <a href="#refsRFC1345">[RFC1345]</a>
-         <a href="#refsWIN949">[WIN949]</a>
-    <tr><td> Shift_JIS <td> windows-31J <td>
-         <a href="#refsSHIFTJIS">[SHIFTJIS]</a>
-         <a href="#refsWIN31J">[WIN31J]</a>
-    <tr><td> TIS-620 <td> windows-874 <td>
-         <a href="#refsTIS620">[TIS620]</a>
-         <a href="#refsWIN874">[WIN874]</a>
-    <tr><td> US-ASCII <td> windows-1252 <td>
-         <a href="#refsRFC1345">[RFC1345]</a>
-         <a href="#refsWIN1252">[WIN1252]</a>
-   </table><p class="note">The requirement to treat certain encodings as other
-  encodings according to the table above is a <a href="#willful-violation">willful
-  violation</a> of the W3C Character Model specification, motivated
-  by a desire for compatibility with legacy content. <a href="#refsCHARMOD">[CHARMOD]</a></p>
-
-  <p>When a user agent is to use the UTF-16 encoding but no BOM has
-  been found, user agents must default to UTF-16LE.</p>
-
-  <p class="note">The requirement to default UTF-16 to LE rather than
-  BE is a <a href="#willful-violation">willful violation</a> of RFC 2781, motivated by a
-  desire for compatibility with legacy content. <a href="#refsCHARMOD">[CHARMOD]</a></p>
-
-  <hr><p>User agents must not support the CESU-8, UTF-7, BOCU-1 and SCSU
-  encodings. <a href="#refsCESU8">[CESU8]</a> <a href="#refsUTF7">[UTF7]</a> <a href="#refsBOCU1">[BOCU1]</a> <a href="#refsSCSU">[SCSU]</a></p>
-
-  <p>Support for encodings based on EBCDIC is not recommended. This
-  encoding is rarely used for publicly-facing Web content.</p>
-
-  <p>Support for UTF-32 is not recommended. This encoding is rarely
-  used, and frequently implemented incorrectly.</p>
-
-  <p class="note">This specification does not make any attempt to
-  support EBCDIC-based encodings and UTF-32 in its algorithms; support
-  and use of these encodings can thus lead to unexpected behavior in
-  implementations of this specification.</p>
-
-  </div><h3 id="common-dom-interfaces"><span class="secno">2.8 </span>Common DOM interfaces</h3><p class="XXX annotation"><b>Status: </b><i>Working draft</i><h4 id="reflecting-content-attributes-in-idl-attributes"><span class="secno">2.8.1 </span>Reflecting content attributes in IDL attributes</h4><p>Some <span title="IDL attribute">IDL attributes</span> are
+  </div><h3 id="common-dom-interfaces"><span class="secno">2.7 </span>Common DOM interfaces</h3><p class="XXX annotation"><b>Status: </b><i>Working draft</i><h4 id="reflecting-content-attributes-in-idl-attributes"><span class="secno">2.7.1 </span>Reflecting content attributes in IDL attributes</h4><p>Some <span title="IDL attribute">IDL attributes</span> are
   defined to <dfn id="reflect">reflect</dfn> a particular <span>content
   attribute</span>. This means that on getting, the IDL attribute
   returns the current value of the content attribute, and on setting,
@@ -4921,7 +4817,7 @@
   attribute. Otherwise, the IDL attribute must be set to the empty
   string.</p>
 
-  </div><h4 id="collections-0"><span class="secno">2.8.2 </span>Collections</h4><p>The <code><a href="#htmlcollection">HTMLCollection</a></code>, <code><a href="#htmlallcollection">HTMLAllCollection</a></code>,
+  </div><h4 id="collections-0"><span class="secno">2.7.2 </span>Collections</h4><p>The <code><a href="#htmlcollection">HTMLCollection</a></code>, <code><a href="#htmlallcollection">HTMLAllCollection</a></code>,
   <code><a href="#htmlformcontrolscollection">HTMLFormControlsCollection</a></code>,
   <code><a href="#htmloptionscollection">HTMLOptionsCollection</a></code>, and
   <code><a href="#htmlpropertycollection">HTMLPropertyCollection</a></code> interfaces represent various
@@ -4944,7 +4840,7 @@
   <p>An attribute that returns a collection must return the same
   object every time it is retrieved.</p>
 
-  </div><h5 id="htmlcollection-0"><span class="secno">2.8.2.1 </span>HTMLCollection</h5><p>The <code><a href="#htmlcollection">HTMLCollection</a></code> interface represents a generic
+  </div><h5 id="htmlcollection-0"><span class="secno">2.7.2.1 </span>HTMLCollection</h5><p>The <code><a href="#htmlcollection">HTMLCollection</a></code> interface represents a generic
   <a href="#collections" title="collections">collection</a> of elements.<pre class="idl">interface <dfn id="htmlcollection">HTMLCollection</dfn> {
   readonly attribute unsigned long <a href="#dom-htmlcollection-length" title="dom-HTMLCollection-length">length</a>;
   caller getter Element <a href="#dom-htmlcollection-item" title="dom-HTMLCollection-item">item</a>(in unsigned long index);
@@ -5030,7 +4926,7 @@
   the method was invoked. In <a href="#html-documents">HTML documents</a>, the argument
   must first be <a href="#converted-to-ascii-lowercase">converted to ASCII lowercase</a>.</p>
 
-  </div><h5 id="htmlallcollection-0"><span class="secno">2.8.2.2 </span>HTMLAllCollection</h5><p>The <code><a href="#htmlallcollection">HTMLAllCollection</a></code> interface represents a generic
+  </div><h5 id="htmlallcollection-0"><span class="secno">2.7.2.2 </span>HTMLAllCollection</h5><p>The <code><a href="#htmlallcollection">HTMLAllCollection</a></code> interface represents a generic
   <a href="#collections" title="collections">collection</a> of elements just like
   <code><a href="#htmlcollection">HTMLCollection</a></code>, with the exception that its <code title="dom-HTMLAllCollection-namedItem"><a href="#dom-htmlallcollection-nameditem">namedItem()</a></code> method
   returns an <code><a href="#htmlcollection">HTMLCollection</a></code> object when there are
@@ -5138,7 +5034,7 @@
   documents</a>, the argument must first be <a href="#converted-to-ascii-lowercase">converted to
   ASCII lowercase</a>.</p>
 
-  </div><h5 id="htmlformcontrolscollection-0"><span class="secno">2.8.2.3 </span>HTMLFormControlsCollection</h5><p>The <code><a href="#htmlformcontrolscollection">HTMLFormControlsCollection</a></code> interface represents
+  </div><h5 id="htmlformcontrolscollection-0"><span class="secno">2.7.2.3 </span>HTMLFormControlsCollection</h5><p>The <code><a href="#htmlformcontrolscollection">HTMLFormControlsCollection</a></code> interface represents
   a <a href="#collections" title="collections">collection</a> of <a href="#category-listed" title="category-listed">listed</a> elements in <code><a href="#the-form-element">form</a></code>
   and <code><a href="#the-fieldset-element">fieldset</a></code> elements.<pre class="idl">interface <dfn id="htmlformcontrolscollection">HTMLFormControlsCollection</dfn> {
   readonly attribute unsigned long <a href="#dom-htmlformcontrolscollection-length" title="dom-HTMLFormControlsCollection-length">length</a>;
@@ -5262,7 +5158,7 @@
 
   </ol><!--
 http://software.hixie.ch/utilities/js/live-dom-viewer/?%3C%21DOCTYPE%20html%3E...%0A%3Cform%20name%3D%22a%22%3E%3Cinput%20id%3D%22x%22%20name%3D%22y%22%3E%3Cinput%20name%3D%22x%22%20id%3D%22y%22%3E%3C/form%3E%0A%3Cscript%3E%0A%20%20var%20x%3B%0A%20%20w%28x%20%3D%20document.forms%5B%27a%27%5D%5B%27x%27%5D%29%3B%0A%20%20w%28x.length%29%3B%0A%20%20x%5B0%5D.parentNode.removeChild%28x%5B0%5D%29%3B%0A%20%20w%28x.length%29%3B%0A%20%20w%28x%20%3D%3D%20document.forms%5B%27a%27%5D%5B%27x%27%5D%29%3B%0A%3C/script%3E%0A
---></div><h5 id="htmloptionscollection-0"><span class="secno">2.8.2.4 </span>HTMLOptionsCollection</h5><p>The <code><a href="#htmloptionscollection">HTMLOptionsCollection</a></code> interface represents a
+--></div><h5 id="htmloptionscollection-0"><span class="secno">2.7.2.4 </span>HTMLOptionsCollection</h5><p>The <code><a href="#htmloptionscollection">HTMLOptionsCollection</a></code> interface represents a
   list of <code><a href="#the-option-element">option</a></code> elements. It is always rooted on a
   <code><a href="#the-select-element">select</a></code> element and has attributes and methods that
   manipulate that element's descendants.<pre class="idl">interface <dfn id="htmloptionscollection">HTMLOptionsCollection</dfn> {
@@ -5416,7 +5312,7 @@
    <li><p>Remove <var title="">element</var> from its parent
    node.</li>
 
-  </ol><!-- see also http://ln.hixie.ch/?start=1161042744&count=1 --></div><h5 id="htmlpropertycollection-0"><span class="secno">2.8.2.5 </span>HTMLPropertyCollection</h5><p>The <code><a href="#htmlpropertycollection">HTMLPropertyCollection</a></code> interface represents a
+  </ol><!-- see also http://ln.hixie.ch/?start=1161042744&count=1 --></div><h5 id="htmlpropertycollection-0"><span class="secno">2.7.2.5 </span>HTMLPropertyCollection</h5><p>The <code><a href="#htmlpropertycollection">HTMLPropertyCollection</a></code> interface represents a
   <a href="#collections" title="collections">collection</a> of elements that add
   name-value pairs to a particular <a href="#concept-item" title="concept-item">item</a> in the <a href="#microdata">microdata</a>
   model.<pre class="idl">interface <dfn id="htmlpropertycollection">HTMLPropertyCollection</dfn> {
@@ -5508,7 +5404,7 @@
   DOM property of each of the elements represented by the object, in
   <a href="#tree-order">tree order</a>.</p>
 
-  </div><h4 id="domtokenlist-0"><span class="secno">2.8.3 </span>DOMTokenList</h4><p>The <code><a href="#domtokenlist">DOMTokenList</a></code> interface represents an interface
+  </div><h4 id="domtokenlist-0"><span class="secno">2.7.3 </span>DOMTokenList</h4><p>The <code><a href="#domtokenlist">DOMTokenList</a></code> interface represents an interface
   to an underlying string that consists of a <a href="#set-of-space-separated-tokens">set of
   space-separated tokens</a>.<p class="note"><code><a href="#domtokenlist">DOMTokenList</a></code> objects are always
   <a href="#case-sensitive">case-sensitive</a>, even when the underlying string might
@@ -5680,7 +5576,7 @@
   <dfn id="dom-tokenlist-tostring" title="dom-tokenlist-toString">stringify</dfn> to the object's
   underlying string representation.</p>
 
-  </div><h4 id="domsettabletokenlist-0"><span class="secno">2.8.4 </span>DOMSettableTokenList</h4><p>The <code><a href="#domsettabletokenlist">DOMSettableTokenList</a></code> interface is the same as the
+  </div><h4 id="domsettabletokenlist-0"><span class="secno">2.7.4 </span>DOMSettableTokenList</h4><p>The <code><a href="#domsettabletokenlist">DOMSettableTokenList</a></code> interface is the same as the
   <code><a href="#domtokenlist">DOMTokenList</a></code> interface, except that it allows the
   underlying string to be directly changed.<pre class="idl">interface <dfn id="domsettabletokenlist">DOMSettableTokenList</dfn> : <a href="#domtokenlist">DOMTokenList</a> {
             attribute DOMString <a href="#dom-domsettabletokenlist-value" title="dom-DOMSettableTokenList-value">value</a>;
@@ -5703,7 +5599,7 @@
 
   </div><div class="impl">
 
-  <h4 id="safe-passing-of-structured-data"><span class="secno">2.8.5 </span>Safe passing of structured data</h4>
+  <h4 id="safe-passing-of-structured-data"><span class="secno">2.7.5 </span>Safe passing of structured data</h4>
 
   <p>When a user agent is required to obtain a <dfn id="structured-clone">structured
   clone</dfn> of an object, it must run the following algorithm, which
@@ -5827,7 +5723,7 @@
 
    <dd><p>Return the null value.</dd>
 
-  </dl></div><h4 id="domstringmap-0"><span class="secno">2.8.6 </span>DOMStringMap</h4><p>The <code><a href="#domstringmap">DOMStringMap</a></code> interface represents a set of
+  </dl></div><h4 id="domstringmap-0"><span class="secno">2.7.6 </span>DOMStringMap</h4><p>The <code><a href="#domstringmap">DOMStringMap</a></code> interface represents a set of
   name-value pairs. It exposes these using the scripting language's
   native mechanisms for property access.<div class="impl">
 
@@ -5901,7 +5797,7 @@
   }
 }</pre>
 
-  </div><h4 id="dom-feature-strings"><span class="secno">2.8.7 </span>DOM feature strings</h4><p>DOM3 Core defines mechanisms for checking for interface support,
+  </div><h4 id="dom-feature-strings"><span class="secno">2.7.7 </span>DOM feature strings</h4><p>DOM3 Core defines mechanisms for checking for interface support,
   and for obtaining implementations of interfaces, using <a href="http://www.w3.org/TR/DOM-Level-3-Core/core.html#DOMFeatures">feature
   strings</a>. <a href="#refsDOMCORE">[DOMCORE]</a><p>Authors are strongly discouraged from using these, as they are
   notoriously unreliable and imprecise. Authors are encouraged to rely
@@ -5914,7 +5810,7 @@
   with <var title="">feature</var> set to either "<code title="">HTML</code>" or "<code title="">XHTML</code>" and <var title="">version</var> set to either "<code>1.0</code>" or
   "<code>2.0</code>".</p>
 
-  </div><h4 id="exceptions"><span class="secno">2.8.8 </span>Exceptions</h4><p>The following <code>DOMException</code> codes are defined in DOM
+  </div><h4 id="exceptions"><span class="secno">2.7.8 </span>Exceptions</h4><p>The following <code>DOMException</code> codes are defined in DOM
   Core. <a href="#refsDOMCORE">[DOMCORE]</a><ol class="brief"><li value="1"><dfn id="index_size_err"><code>INDEX_SIZE_ERR</code></dfn></li>
    <li value="2"><dfn id="domstring_size_err"><code>DOMSTRING_SIZE_ERR</code></dfn></li>
    <li value="3"><dfn id="hierarchy_request_err"><code>HIERARCHY_REQUEST_ERR</code></dfn></li>
@@ -5942,7 +5838,7 @@
    <li value="82"><dfn id="serialize_err"><code>SERIALIZE_ERR</code></dfn></li> <!-- actually defined in dom3ls -->
   </ol><div class="impl">
 
-  <h4 id="garbage-collection"><span class="secno">2.8.9 </span>Garbage collection</h4>
+  <h4 id="garbage-collection"><span class="secno">2.7.9 </span>Garbage collection</h4>
 
   <p>There is an <dfn id="implied-strong-reference">implied strong reference</dfn> from any IDL
   attribute that returns a pre-existing object to that object.</p>
@@ -54664,8 +54560,111 @@
   use for the input stream.</p>
 
 
+  <h5 id="character-encodings-0"><span class="secno">9.2.2.2 </span>Character encodings</h5><p class="XXX annotation"><b>Status: </b><i>Working draft</i></p>
 
-  <h5 id="preprocessing-the-input-stream"><span class="secno">9.2.2.2 </span>Preprocessing the input stream</h5>
+  <p>User agents must at a minimum support the UTF-8 and Windows-1252
+  encodings, but may support more.</p>
+
+  <p class="note">It is not unusual for Web browsers to support dozens
+  if not upwards of a hundred distinct character encodings.</p>
+
+  <p>User agents must support the preferred MIME name of every
+  character encoding they support that has a preferred MIME name, and
+  should support all the IANA-registered aliases of every character
+  encoding they support. <a href="#refsIANACHARSET">[IANACHARSET]</a></p>
+
+  <p>When comparing a string specifying a character encoding with the
+  name or alias of a character encoding to determine if they are
+  equal, user agents must remove any leading or trailing <a href="#space-character" title="space character">space characters</a> in both names, and
+  then perform the comparison in an <a href="#ascii-case-insensitive">ASCII
+  case-insensitive</a> manner.</p>
+
+<!-- this bit will be replaced by actual alias registrations in due course -->
+
+  <p>In addition, user agents must support the aliases given in the
+  following table for every character encoding they support, so that
+  labels from the first column are treated as equivalent to the labels
+  given in the corresponding cell from the second column on the same
+  row.</p>
+
+  <table><caption>Additional character encoding aliases</caption>
+   <thead><tr><th> Alias <th> Corresponding encoding <th> References
+   <tbody><tr><td> x-sjis <td> windows-31J <td>
+         <a href="#refsSHIFTJIS">[SHIFTJIS]</a>
+         <a href="#refsWIN31J">[WIN31J]</a>
+    <tr><td> windows-932 <td> windows-31J <td>
+         <a href="#refsWIN31J">[WIN31J]</a>
+    <tr><td> x-x-big5 <td> Big5 <td>
+         <a href="#refsBIG5">[BIG5]</a>
+   </table><!-- end of bit that will be replaced by actual alias registrations in due course --><hr><p>When a user agent would otherwise use an encoding given in the
+  first column of the following table to either convert content to
+  Unicode characters or convert Unicode characters to bytes, it must
+  instead use the encoding given in the cell in the second column of
+  the same row. When a byte or sequence of bytes is treated
+  differently due to this encoding aliasing, it is said to have been
+  <dfn id="misinterpreted-for-compatibility">misinterpreted for compatibility</dfn>.</p>
+
+  <table><caption>Character encoding overrides</caption>
+   <thead><tr><th> Input encoding <th> Replacement encoding <th> References
+   <tbody><!-- how about EUC-JP? --><tr><td> EUC-KR <td> windows-949 <td>
+         <a href="#refsEUCKR">[EUCKR]</a>
+         <a href="#refsWIN949">[WIN949]</a>
+    <tr><td> GB2312 <td> GBK <td>
+         <a href="#refsRFC1345">[RFC1345]</a>
+         <a href="#refsGBK">[GBK]</a>
+    <tr><td> GB_2312-80 <td> GBK <td>
+         <a href="#refsRFC1345">[RFC1345]</a>
+         <a href="#refsGBK">[GBK]</a>
+    <tr><td> ISO-8859-1 <td> windows-1252 <td>
+         <a href="#refsRFC1345">[RFC1345]</a>
+         <a href="#refsWIN1252">[WIN1252]</a>
+    <tr><td> ISO-8859-9 <td> windows-1254 <td>
+         <a href="#refsRFC1345">[RFC1345]</a>
+         <a href="#refsWIN1254">[WIN1254]</a>
+    <tr><td> ISO-8859-11 <td> windows-874 <td>
+         <a href="#refsISO885911">[ISO885911]</a>
+         <a href="#refsWIN874">[WIN874]</a>
+    <tr><td> KS_C_5601-1987 <td> windows-949 <td>
+         <a href="#refsRFC1345">[RFC1345]</a>
+         <a href="#refsWIN949">[WIN949]</a>
+    <tr><td> Shift_JIS <td> windows-31J <td>
+         <a href="#refsSHIFTJIS">[SHIFTJIS]</a>
+         <a href="#refsWIN31J">[WIN31J]</a>
+    <tr><td> TIS-620 <td> windows-874 <td>
+         <a href="#refsTIS620">[TIS620]</a>
+         <a href="#refsWIN874">[WIN874]</a>
+    <tr><td> US-ASCII <td> windows-1252 <td>
+         <a href="#refsRFC1345">[RFC1345]</a>
+         <a href="#refsWIN1252">[WIN1252]</a>
+   </table><p class="note">The requirement to treat certain encodings as other
+  encodings according to the table above is a <a href="#willful-violation">willful
+  violation</a> of the W3C Character Model specification, motivated
+  by a desire for compatibility with legacy content. <a href="#refsCHARMOD">[CHARMOD]</a></p>
+
+  <p>When a user agent is to use the UTF-16 encoding but no BOM has
+  been found, user agents must default to UTF-16LE.</p>
+
+  <p class="note">The requirement to default UTF-16 to LE rather than
+  BE is a <a href="#willful-violation">willful violation</a> of RFC 2781, motivated by a
+  desire for compatibility with legacy content. <a href="#refsCHARMOD">[CHARMOD]</a></p>
+
+  <hr><p>User agents must not support the CESU-8, UTF-7, BOCU-1 and SCSU
+  encodings. <a href="#refsCESU8">[CESU8]</a> <a href="#refsUTF7">[UTF7]</a> <a href="#refsBOCU1">[BOCU1]</a> <a href="#refsSCSU">[SCSU]</a></p>
+
+  <p>Support for encodings based on EBCDIC is not recommended. This
+  encoding is rarely used for publicly-facing Web content.</p>
+
+  <p>Support for UTF-32 is not recommended. This encoding is rarely
+  used, and frequently implemented incorrectly.</p>
+
+  <p class="note">This specification does not make any attempt to
+  support EBCDIC-based encodings and UTF-32 in its algorithms; support
+  and use of these encodings can thus lead to unexpected behavior in
+  implementations of this specification.</p>
+
+
+
+  <h5 id="preprocessing-the-input-stream"><span class="secno">9.2.2.3 </span>Preprocessing the input stream</h5>
 
   <p>Given an encoding, the bytes in the input stream must be
   converted to Unicode characters for the tokenizer, as described by
@@ -54740,7 +54739,7 @@
   the stream, but rather the lack of any further characters.</p>
 
 
-  <h5 id="changing-the-encoding-while-parsing"><span class="secno">9.2.2.3 </span>Changing the encoding while parsing</h5>
+  <h5 id="changing-the-encoding-while-parsing"><span class="secno">9.2.2.4 </span>Changing the encoding while parsing</h5>
 
   <p>When the parser requires the user agent to <dfn id="change-the-encoding">change the
   encoding</dfn>, it must run the following steps. This might happen

Received on Wednesday, 9 September 2009 06:43:16 UTC