spec/Overview.html 1.2013 2842 Abstract out the encoding stuff from the

Abstract out the encoding stuff from the parser to the infrastructure
section so that it also affects form submission (whatwg r2842)

DOMStringMap
http://people.w3.org/mike/diffs/html5/spec/Overview.1.2013.html#domstringmap-0
HTMLFormControlsCollection
http://people.w3.org/mike/diffs/html5/spec/Overview.1.2013.html#htmlformcontrolscollection-0
2.9.2.1 HTMLCollection
http://people.w3.org/mike/diffs/html5/spec/Overview.1.2013.html#htmlcollection
2.9.4 Safe passing of structured data
http://people.w3.org/mike/diffs/html5/spec/Overview.1.2013.html#safe-passing-of-structured-data
PARSE_ERR
http://people.w3.org/mike/diffs/html5/spec/Overview.1.2013.html#parse_err
misinterpreted for compatibility
http://people.w3.org/mike/diffs/html5/spec/Overview.1.2013.html#misinterpreted-for-compatibility
hasFeature(feature, version)
http://people.w3.org/mike/diffs/html5/spec/Overview.1.2013.html#hasfeature
SERIALISE_ERR
http://people.w3.org/mike/diffs/html5/spec/Overview.1.2013.html#serialise_err
8.2.2.2 Preprocessing the input stream
http://people.w3.org/mike/diffs/html5/spec/Overview.1.2013.html#preprocessing-the-input-stream
2.9.8 Garbage collection
http://people.w3.org/mike/diffs/html5/spec/Overview.1.2013.html#garbage-collection
2.9.2.3 HTMLOptionsCollection
http://people.w3.org/mike/diffs/html5/spec/Overview.1.2013.html#htmloptionscollection
represents
http://people.w3.org/mike/diffs/html5/spec/Overview.1.2013.html#represented-by-the-collection
2.9.3 DOMTokenList
http://people.w3.org/mike/diffs/html5/spec/Overview.1.2013.html#domtokenlist
DOMSTRING_SIZE_ERR
http://people.w3.org/mike/diffs/html5/spec/Overview.1.2013.html#domstring_size_err
2.9.2 Collections
http://people.w3.org/mike/diffs/html5/spec/Overview.1.2013.html#collections
2.9.5 DOMStringMap
http://people.w3.org/mike/diffs/html5/spec/Overview.1.2013.html#domstringmap
INDEX_SIZE_ERR
http://people.w3.org/mike/diffs/html5/spec/Overview.1.2013.html#index_size_err
get an attribute
http://people.w3.org/mike/diffs/html5/spec/Overview.1.2013.html#concept-get-attributes-when-sniffing
HTMLCollection
http://people.w3.org/mike/diffs/html5/spec/Overview.1.2013.html#htmlcollection-0
namedItem(key)
http://people.w3.org/mike/diffs/html5/spec/Overview.1.2013.html#dom-htmlcollection-nameditem
explicit "EOF" character
http://people.w3.org/mike/diffs/html5/spec/Overview.1.2013.html#explicit-eof-character
limited to only positive non-zero numbers
http://people.w3.org/mike/diffs/html5/spec/Overview.1.2013.html#limited-to-only-positive-non-zero-numbers
2.9.6 DOM feature strings
http://people.w3.org/mike/diffs/html5/spec/Overview.1.2013.html#dom-feature-strings
reflect
http://people.w3.org/mike/diffs/html5/spec/Overview.1.2013.html#reflect
UNAVAILABLE_SCRIPT_ERR
http://people.w3.org/mike/diffs/html5/spec/Overview.1.2013.html#unavailable_script_err
2.8 Character encodings
http://people.w3.org/mike/diffs/html5/spec/Overview.1.2013.html#character-encodings-0
stringify
http://people.w3.org/mike/diffs/html5/spec/Overview.1.2013.html#dom-tokenlist-tostring
2.9.2.2 HTMLFormControlsCollection
http://people.w3.org/mike/diffs/html5/spec/Overview.1.2013.html#htmlformcontrolscollection
internal structured cloning algorithm
http://people.w3.org/mike/diffs/html5/spec/Overview.1.2013.html#internal-structured-cloning-algorithm
remove(index)
http://people.w3.org/mike/diffs/html5/spec/Overview.1.2013.html#dom-htmloptionscollection-remove
HIERARCHY_REQUEST_ERR
http://people.w3.org/mike/diffs/html5/spec/Overview.1.2013.html#hierarchy_request_err
2.9 Common DOM interfaces
http://people.w3.org/mike/diffs/html5/spec/Overview.1.2013.html#common-dom-interfaces
2.7.6 Content-Type sniffing: feed or HTML
http://people.w3.org/mike/diffs/html5/spec/Overview.1.2013.html#content-type-sniffing:-feed-or-html
toggle(token)
http://people.w3.org/mike/diffs/html5/spec/Overview.1.2013.html#dom-tokenlist-toggle
2.9.7 Exceptions
http://people.w3.org/mike/diffs/html5/spec/Overview.1.2013.html#exceptions
8.2.2.3 Changing the encoding while parsing
http://people.w3.org/mike/diffs/html5/spec/Overview.1.2013.html#changing-the-encoding-while-parsing
namedItem(name)
http://people.w3.org/mike/diffs/html5/spec/Overview.1.2013.html#dom-htmlformcontrolscollection-nameditem

http://people.w3.org/mike/diffs/html5/spec/Overview.diff.html
http://dev.w3.org/cvsweb/html5/spec/Overview.html?r1=1.2012&r2=1.2013&f=h
http://html5.org/tools/web-apps-tracker?from=2841&to=2842

===================================================================
RCS file: /sources/public/html5/spec/Overview.html,v
retrieving revision 1.2012
retrieving revision 1.2013
diff -u -d -r1.2012 -r1.2013
--- Overview.html 19 Feb 2009 10:20:10 -0000 1.2012
+++ Overview.html 19 Feb 2009 11:04:59 -0000 1.2013
@@ -201,20 +201,21 @@
      <li><a href=#content-type-sniffing:-unknown-type><span class=secno>2.7.4 </span>Content-Type sniffing: unknown type</a></li>
      <li><a href=#content-type-sniffing:-image><span class=secno>2.7.5 </span>Content-Type sniffing: image</a></li>
      <li><a href=#content-type-sniffing:-feed-or-html><span class=secno>2.7.6 </span>Content-Type sniffing: feed or HTML</a></ol></li>
-   <li><a href=#common-dom-interfaces><span class=secno>2.8 </span>Common DOM interfaces</a>
+   <li><a href=#character-encodings-0><span class=secno>2.8 </span>Character encodings</a></li>
+   <li><a href=#common-dom-interfaces><span class=secno>2.9 </span>Common DOM interfaces</a>
     <ol>
-     <li><a href=#reflecting-content-attributes-in-dom-attributes><span class=secno>2.8.1 </span>Reflecting content attributes in DOM attributes</a></li>
-     <li><a href=#collections><span class=secno>2.8.2 </span>Collections</a>
+     <li><a href=#reflecting-content-attributes-in-dom-attributes><span class=secno>2.9.1 </span>Reflecting content attributes in DOM attributes</a></li>
+     <li><a href=#collections><span class=secno>2.9.2 </span>Collections</a>
       <ol>
-       <li><a href=#htmlcollection><span class=secno>2.8.2.1 </span>HTMLCollection</a></li>
-       <li><a href=#htmlformcontrolscollection><span class=secno>2.8.2.2 </span>HTMLFormControlsCollection</a></li>
-       <li><a href=#htmloptionscollection><span class=secno>2.8.2.3 </span>HTMLOptionsCollection</a></ol></li>
-     <li><a href=#domtokenlist><span class=secno>2.8.3 </span>DOMTokenList</a></li>
-     <li><a href=#safe-passing-of-structured-data><span class=secno>2.8.4 </span>Safe passing of structured data</a></li>
-     <li><a href=#domstringmap><span class=secno>2.8.5 </span>DOMStringMap</a></li>
-     <li><a href=#dom-feature-strings><span class=secno>2.8.6 </span>DOM feature strings</a></li>
-     <li><a href=#exceptions><span class=secno>2.8.7 </span>Exceptions</a></li>
-     <li><a href=#garbage-collection><span class=secno>2.8.8 </span>Garbage collection</a></ol></ol></li>
+       <li><a href=#htmlcollection><span class=secno>2.9.2.1 </span>HTMLCollection</a></li>
+       <li><a href=#htmlformcontrolscollection><span class=secno>2.9.2.2 </span>HTMLFormControlsCollection</a></li>
+       <li><a href=#htmloptionscollection><span class=secno>2.9.2.3 </span>HTMLOptionsCollection</a></ol></li>
+     <li><a href=#domtokenlist><span class=secno>2.9.3 </span>DOMTokenList</a></li>
+     <li><a href=#safe-passing-of-structured-data><span class=secno>2.9.4 </span>Safe passing of structured data</a></li>
+     <li><a href=#domstringmap><span class=secno>2.9.5 </span>DOMStringMap</a></li>
+     <li><a href=#dom-feature-strings><span class=secno>2.9.6 </span>DOM feature strings</a></li>
+     <li><a href=#exceptions><span class=secno>2.9.7 </span>Exceptions</a></li>
+     <li><a href=#garbage-collection><span class=secno>2.9.8 </span>Garbage collection</a></ol></ol></li>
  <li><a href=#dom><span class=secno>3 </span>Semantics and structure of HTML documents</a>
   <ol>
    <li><a href=#semantics-intro><span class=secno>3.1 </span>Introduction</a></li>
@@ -864,9 +865,8 @@
      <li><a href=#the-input-stream><span class=secno>8.2.2 </span>The input stream</a>
       <ol>
        <li><a href=#determining-the-character-encoding><span class=secno>8.2.2.1 </span>Determining the character encoding</a></li>
-       <li><a href=#character-encoding-requirements><span class=secno>8.2.2.2 </span>Character encoding requirements</a></li>
-       <li><a href=#preprocessing-the-input-stream><span class=secno>8.2.2.3 </span>Preprocessing the input stream</a></li>
-       <li><a href=#changing-the-encoding-while-parsing><span class=secno>8.2.2.4 </span>Changing the encoding while parsing</a></ol></li>
+       <li><a href=#preprocessing-the-input-stream><span class=secno>8.2.2.2 </span>Preprocessing the input stream</a></li>
+       <li><a href=#changing-the-encoding-while-parsing><span class=secno>8.2.2.3 </span>Changing the encoding while parsing</a></ol></li>
      <li><a href=#parse-state><span class=secno>8.2.3 </span>Parse state</a>
       <ol>
        <li><a href=#the-insertion-mode><span class=secno>8.2.3.1 </span>The insertion mode</a></li>
@@ -4711,7 +4711,61 @@
 
   </ol><p class=note>For efficiency reasons, implementations may wish to
   implement this algorithm and the algorithm for detecting the
-  character encoding of HTML documents in parallel.<h3 id=common-dom-interfaces><span class=secno>2.8 </span>Common DOM interfaces</h3><h4 id=reflecting-content-attributes-in-dom-attributes><span class=secno>2.8.1 </span>Reflecting content attributes in DOM attributes</h4><p>Some <span title="DOM attribute">DOM attributes</span> are
+  character encoding of HTML documents in parallel.<h3 id=character-encodings-0><span class=secno>2.8 </span>Character encodings</h3><p>User agents must at a minimum support the UTF-8 and Windows-1252
+  encodings, but may support more.<p class=note>It is not unusual for Web browsers to support dozens
+  if not upwards of a hundred distinct character encodings.<p>User agents must support the preferred MIME name of every
+  character encoding they support that has a preferred MIME name, and
+  should support all the IANA-registered aliases. <a href=#references>[IANACHARSET]</a><p>When comparing a string specifying a character encoding with the
+  name or alias of a character encoding to determine if they are
+  equal, user agents must use the Charset Alias Matching rules defined
+  in Unicode Technical Standard #22. <a href=#references>[UTS22]</a></p><!-- XXXrefs
+  http://unicode.org/reports/tr22/#Charset_Alias_Matching --><p class=example>For instance, "GB_2312-80" and "g.b.2312(80)" are
+  considered equivalent names.</p><hr><p>When a user agent would otherwise use an encoding given in the
+  first column of the following table to either convert content to
+  Unicode characters or convert Unicode characters to bytes, it must
+  instead use the encoding given in the cell in the second column of
+  the same row. When a byte or sequence of bytes is treated
+  differently due to this encoding aliasing, it is said to have been
+  <dfn id=misinterpreted-for-compatibility>misinterpreted for compatibility</dfn>.<table><caption>Character encoding overrides</caption>
+   <thead><tr><th> Input encoding <th> Replacement encoding <th> References
+   <tbody><!-- how about EUC-JP? --><tr><td> EUC-KR <td> Windows-949 <td>
+         <a href=#references>[EUCKR]</a> <!-- see reference for [EUC-KR] in RFC1557 -->
+         <a href=#references>[WIN949]</a><!-- http://www.microsoft.com/globaldev/reference/dbcs/949.mspx -->
+    <tr><td> GB2312 <td> GBK <td>
+         <a href=#references>[GB2312]</a><!-- XXX ? -->
+         <a href=#references>[GBK]</a><!-- http://www.iana.org/assignments/charset-reg/GBK -->
+    <tr><td> GB_2312-80 <td> GBK <td>
+         <a href=#references>[RFC1345]</a><!-- XXX consider more direct reference? -->
+         <a href=#references>[GBK]</a><!-- http://www.iana.org/assignments/charset-reg/GBK -->
+    <tr><td> ISO-8859-1 <td> Windows-1252 <td>
+         <a href=#references>[RFC1345]</a><!-- XXX consider more direct reference? -->
+         <a href=#references>[WIN1252]</a><!-- http://www.microsoft.com/globaldev/reference/sbcs/1252.htm -->
+    <tr><td> ISO-8859-9 <td> Windows-1254 <td>
+         <a href=#references>[RFC1345]</a><!-- XXX consider more direct reference? -->
+         <a href=#references>[WIN1254]</a><!-- http://www.microsoft.com/globaldev/reference/sbcs/1254.htm -->
+    <tr><td> ISO-8859-11 <td> Windows-874 <td>
+         <a href=#references>[ISO885911]</a><!-- get reference from http://www.iso.org/iso/iso_catalogue/catalogue_tc/catalogue_detail.htm?csnumber=28263 -->
+         <a href=#references>[WIN874]</a><!-- http://www.microsoft.com/globaldev/reference/sbcs/874.mspx -->
+    <tr><td> KS_C_5601-1987 <td> Windows-949 <td>
+         <a href=#references>[RFC1345]</a><!-- XXX consider more direct reference? -->
+         <a href=#references>[WIN949]</a><!-- http://www.microsoft.com/globaldev/reference/dbcs/949.mspx -->
+    <tr><td> TIS-620 <td> Windows-874 <td>
+         <a href=#references>[TIS620]</a> <!-- http://www.nectec.or.th/it-standards/std620/std620.htm -->
+         <a href=#references>[WIN874]</a><!-- http://www.microsoft.com/globaldev/reference/sbcs/874.mspx -->
+    <tr><td> US-ASCII <td> Windows-1252 <td>
+         <a href=#references>[RFC1345]</a><!-- XXX consider more direct reference? -->
+         <a href=#references>[WIN1252]</a><!-- http://www.microsoft.com/globaldev/reference/sbcs/1252.htm -->
+    <tr><td> x-x-big5 <td> Big5 <td>
+         <a href=#references>[BIG5]</a> <!-- XXX ? -->
+   </table><p class=note>The requirement to treat certain encodings as other
+  encodings according to the table above is a willful violation of the
+  W3C Character Model specification. <a href=#references>[CHARMOD]</a></p><hr><p>User agents must not support the CESU-8, UTF-7, BOCU-1 and SCSU
+  encodings. <a href=#references>[CESU8]</a> <a href=#references>[UTF7]</a> <a href=#references>[BOCU1]</a> <a href=#references>[SCSU]</a><p>Support for encodings based on EBCDIC is not recommended. This
+  encoding is rarely used for publicly-facing Web content.<p>Support for UTF-32 is not recommended. This encoding is rarely
+  used, and frequently misimplemented.<p class=note>This specification does not make any attempt to
+  support EBCDIC-based encodings and UTF-32 in its algorithms; support
+  and use of these encodings can thus lead to unexpected behavior in
+  implementations of this specification.<h3 id=common-dom-interfaces><span class=secno>2.9 </span>Common DOM interfaces</h3><h4 id=reflecting-content-attributes-in-dom-attributes><span class=secno>2.9.1 </span>Reflecting content attributes in DOM attributes</h4><p>Some <span title="DOM attribute">DOM attributes</span> are
   defined to <dfn id=reflect>reflect</dfn> a particular <span>content
   attribute</span>. This means that on getting, the DOM attribute
   returns the current value of the content attribute, and on setting,
@@ -4839,7 +4893,7 @@
   </ol><p>On setting, if the given element has an <code title=attr-id><a href=#the-id-attribute>id</a></code> attribute, then the content attribute must
   be set to the value of that <code title=attr-id><a href=#the-id-attribute>id</a></code>
   attribute. Otherwise, the DOM attribute must be set to the empty
-  string.</p><!-- XXX or raise an exception? --><h4 id=collections><span class=secno>2.8.2 </span>Collections</h4><p>The <code><a href=#htmlcollection-0>HTMLCollection</a></code>,
+  string.</p><!-- XXX or raise an exception? --><h4 id=collections><span class=secno>2.9.2 </span>Collections</h4><p>The <code><a href=#htmlcollection-0>HTMLCollection</a></code>,
   <code><a href=#htmlformcontrolscollection-0>HTMLFormControlsCollection</a></code>, and
   <code><a href=#htmloptionscollection-0>HTMLOptionsCollection</a></code> interfaces represent various
   lists of DOM nodes. Collectively, objects implementing these
@@ -4855,7 +4909,7 @@
   nodes within the collection must be sorted in <a href=#tree-order>tree
   order</a>.<p class=note>The <code title=dom-table-rows><a href=#dom-table-rows>rows</a></code> list is
   not in tree order.<p>An attribute that returns a collection must return the same
-  object every time it is retrieved.<h5 id=htmlcollection><span class=secno>2.8.2.1 </span>HTMLCollection</h5><p>The <code><a href=#htmlcollection-0>HTMLCollection</a></code> interface represents a generic
+  object every time it is retrieved.<h5 id=htmlcollection><span class=secno>2.9.2.1 </span>HTMLCollection</h5><p>The <code><a href=#htmlcollection-0>HTMLCollection</a></code> interface represents a generic
   <a href=#collections-0 title=collections>collection</a> of elements.<pre class=idl>[Callable=<a href=#dom-htmlcollection-nameditem title=dom-HTMLCollection-namedItem>namedItem</a>]
 interface <dfn id=htmlcollection-0>HTMLCollection</dfn> {
   readonly attribute unsigned long <a href=#dom-htmlcollection-length title=dom-HTMLCollection-length>length</a>;
@@ -4886,7 +4940,7 @@
    <li>It is an element with an ID <var title="">key</var>.</li>
 
   </ul><p>If no such elements are found, then the method must return
-  null.<h5 id=htmlformcontrolscollection><span class=secno>2.8.2.2 </span>HTMLFormControlsCollection</h5><p>The <code><a href=#htmlformcontrolscollection-0>HTMLFormControlsCollection</a></code> interface represents
+  null.<h5 id=htmlformcontrolscollection><span class=secno>2.9.2.2 </span>HTMLFormControlsCollection</h5><p>The <code><a href=#htmlformcontrolscollection-0>HTMLFormControlsCollection</a></code> interface represents
   a <a href=#collections-0 title=collections>collection</a> of <a href=#category-listed title=category-listed>listed</a> elements in <code><a href=#the-form-element>form</a></code>
   and <code><a href=#the-fieldset-element>fieldset</a></code> elements.<pre class=idl>[Callable=<a href=#dom-htmlformcontrolscollection-nameditem title=dom-HTMLFormControlsCollection-namedItem>namedItem</a>]
 interface <dfn id=htmlformcontrolscollection-0>HTMLFormControlsCollection</dfn> {
@@ -4923,7 +4977,7 @@
 
   </ol><!--
 http://software.hixie.ch/utilities/js/live-dom-viewer/?%3C%21DOCTYPE%20html%3E...%0A%3Cform%20name%3D%22a%22%3E%3Cinput%20id%3D%22x%22%20name%3D%22y%22%3E%3Cinput%20name%3D%22x%22%20id%3D%22y%22%3E%3C/form%3E%0A%3Cscript%3E%0A%20%20var%20x%3B%0A%20%20w%28x%20%3D%20document.forms%5B%27a%27%5D%5B%27x%27%5D%29%3B%0A%20%20w%28x.length%29%3B%0A%20%20x%5B0%5D.parentNode.removeChild%28x%5B0%5D%29%3B%0A%20%20w%28x.length%29%3B%0A%20%20w%28x%20%3D%3D%20document.forms%5B%27a%27%5D%5B%27x%27%5D%29%3B%0A%3C/script%3E%0A
---><h5 id=htmloptionscollection><span class=secno>2.8.2.3 </span>HTMLOptionsCollection</h5><p>The <code><a href=#htmloptionscollection-0>HTMLOptionsCollection</a></code> interface represents a
+--><h5 id=htmloptionscollection><span class=secno>2.9.2.3 </span>HTMLOptionsCollection</h5><p>The <code><a href=#htmloptionscollection-0>HTMLOptionsCollection</a></code> interface represents a
   list of <code><a href=#the-option-element>option</a></code> elements. It is always rooted on a
   <code><a href=#the-select-element>select</a></code> element and has attributes and methods that
   manipulate that element's descendants.<pre class=idl>[Callable=<a href=#dom-htmloptionscollection-nameditem title=dom-HTMLOptionsCollection-namedItem>namedItem</a>]
@@ -5019,7 +5073,7 @@
    <li><p>Remove <var title="">element</var> from its parent
    node.</li>
 
-  </ol><!-- see also http://ln.hixie.ch/?start=1161042744&count=1 --><h4 id=domtokenlist><span class=secno>2.8.3 </span>DOMTokenList</h4><p>The <code><a href=#domtokenlist-0>DOMTokenList</a></code> interface represents an interface
+  </ol><!-- see also http://ln.hixie.ch/?start=1161042744&count=1 --><h4 id=domtokenlist><span class=secno>2.9.3 </span>DOMTokenList</h4><p>The <code><a href=#domtokenlist-0>DOMTokenList</a></code> interface represents an interface
   to an underlying string that consists of an <a href=#unordered-set-of-unique-space-separated-tokens>unordered set of
   unique space-separated tokens</a>.<p>Which string underlies a particular <code><a href=#domtokenlist-0>DOMTokenList</a></code>
   object is defined when the object is created. It might be a content
@@ -5116,7 +5170,7 @@
 
   </ol><p>Objects implementing the <code><a href=#domtokenlist-0>DOMTokenList</a></code> interface must
   <dfn id=dom-tokenlist-tostring title=dom-tokenlist-toString>stringify</dfn> to the object's
-  underlying string representation.<h4 id=safe-passing-of-structured-data><span class=secno>2.8.4 </span>Safe passing of structured data</h4><p>When a user agent is required to obtain a <dfn id=structured-clone>structured
+  underlying string representation.<h4 id=safe-passing-of-structured-data><span class=secno>2.9.4 </span>Safe passing of structured data</h4><p>When a user agent is required to obtain a <dfn id=structured-clone>structured
   clone</dfn> of an object, it must run the following algorithm, which
   either returns a separate object, or throws an exception.<ol><li><p>Let <var title="">input</var> be the object being
    cloned.</li>
@@ -5193,7 +5247,7 @@
 
     </ol></dd>
 
-  </dl><h4 id=domstringmap><span class=secno>2.8.5 </span>DOMStringMap</h4><p>The <code><a href=#domstringmap-0>DOMStringMap</a></code> interface represents a set of
+  </dl><h4 id=domstringmap><span class=secno>2.9.5 </span>DOMStringMap</h4><p>The <code><a href=#domstringmap-0>DOMStringMap</a></code> interface represents a set of
   name-value pairs. When a <code><a href=#domstringmap-0>DOMStringMap</a></code> object is
   instantiated, it is associated with three algorithms, one for
   getting getting the list of name-value pairs, one for setting names
@@ -5215,7 +5269,7 @@
   name.<p class=note>The <code><a href=#domstringmap-0>DOMStringMap</a></code> interface definition
   here is only intended for JavaScript environments. Other language
   bindings will need to define how <code><a href=#domstringmap-0>DOMStringMap</a></code> is to be
-  implemented for those languages.<h4 id=dom-feature-strings><span class=secno>2.8.6 </span>DOM feature strings</h4><p>DOM3 Core defines mechanisms for checking for interface support,
+  implemented for those languages.<h4 id=dom-feature-strings><span class=secno>2.9.6 </span>DOM feature strings</h4><p>DOM3 Core defines mechanisms for checking for interface support,
   and for obtaining implementations of interfaces, using <a href=http://www.w3.org/TR/DOM-Level-3-Core/core.html#DOMFeatures>feature
   strings</a>. <a href=#references>[DOM3CORE]</a><p>A DOM application can use the <dfn id=hasfeature title=hasFeature><code>hasFeature(<var title="">feature</var>,
   <var title="">version</var>)</code></dfn> method of the
@@ -5235,7 +5289,7 @@
   always supersets of the interfaces defined in DOM2 HTML; some
   features that were formerly deprecated, poorly supported, rarely
   used or considered unnecessary have been removed. Therefore it is
-  not guaranteed that an implementation that supports "<code title="">HTML</code>" "<code>5.0</code>" also supports "<code title="">HTML</code>" "<code>2.0</code>".<h4 id=exceptions><span class=secno>2.8.7 </span>Exceptions</h4><p>The following <code>DOMException</code> codes are defined in DOM
+  not guaranteed that an implementation that supports "<code title="">HTML</code>" "<code>5.0</code>" also supports "<code title="">HTML</code>" "<code>2.0</code>".<h4 id=exceptions><span class=secno>2.9.7 </span>Exceptions</h4><p>The following <code>DOMException</code> codes are defined in DOM
   Core. <a href=#references>[DOMCORE]</a></p><!-- XXX xref all these exceptions to DOM3CORE --><ol class=brief><li value=1><dfn id=index_size_err><code>INDEX_SIZE_ERR</code></dfn></li>
    <li value=2><dfn id=domstring_size_err><code>DOMSTRING_SIZE_ERR</code></dfn></li>
    <li value=3><dfn id=hierarchy_request_err><code>HIERARCHY_REQUEST_ERR</code></dfn></li>
@@ -5261,7 +5315,7 @@
    <li value=23><dfn id=unavailable_script_err><code>UNAVAILABLE_SCRIPT_ERR</code></dfn></li> <!-- actually defined right here for now -->
    <li value=81><dfn id=parse_err><code>PARSE_ERR</code></dfn></li> <!-- actually defined in dom3ls -->
    <li value=82><dfn id=serialise_err><code>SERIALISE_ERR</code></dfn></li> <!-- actually defined in dom3ls -->
-  </ol><h4 id=garbage-collection><span class=secno>2.8.8 </span>Garbage collection</h4><p>There is an <dfn id=implied-strong-reference>implied strong reference</dfn> from any DOM
+  </ol><h4 id=garbage-collection><span class=secno>2.9.8 </span>Garbage collection</h4><p>There is an <dfn id=implied-strong-reference>implied strong reference</dfn> from any DOM
   attribute that returns a pre-existing object to that object.<div class=example>
 
    <p>For example, the <code>document.location</code> attribute means
@@ -39555,63 +39609,7 @@
   </ol><p>The <a href=#document-s-character-encoding>document's character encoding</a> must immediately
   be set to the value returned from this algorithm, at the same time
   as the user agent uses the returned value to select the decoder to
-  use for the input stream.<h5 id=character-encoding-requirements><span class=secno>8.2.2.2 </span>Character encoding requirements</h5><p>User agents must at a minimum support the UTF-8 and Windows-1252
-  encodings, but may support more.<p class=note>It is not unusual for Web browsers to support dozens
-  if not upwards of a hundred distinct character encodings.<p>User agents must support the preferred MIME name of every
-  character encoding they support that has a preferred MIME name, and
-  should support all the IANA-registered aliases. <a href=#references>[IANACHARSET]</a></p><!-- XXX should all this be abstracted out so it can be used for
-  <script charset=""> and <form accept-charset="">? Maybe move this
-  stuff and the 'character encodings' section of the terminology
-  section into its own infrastructure subsection? --><p>When comparing a string specifying a character encoding with the
-  name or alias of a character encoding to determine if they are
-  equal, user agents must use the Charset Alias Matching rules defined
-  in Unicode Technical Standard #22. <a href=#references>[UTS22]</a></p><!-- XXXrefs
-  http://unicode.org/reports/tr22/#Charset_Alias_Matching --><p class=example>For instance, "GB_2312-80" and "g.b.2312(80)" are
-  considered equivalent names.<p>When a user agent would otherwise use an encoding given in the
-  first column of the following table, it must instead use the
-  encoding given in the cell in the second column of the same row. Any
-  bytes that are treated differently due to this encoding aliasing
-  must be considered <a href=#parse-error title="parse error">parse
-  errors</a>.<table><caption>Character encoding overrides</caption>
-   <thead><tr><th> Input encoding <th> Replacement encoding <th> References
-   <tbody><!-- how about EUC-JP? --><tr><td> EUC-KR <td> Windows-949 <td>
-         <a href=#references>[EUCKR]</a> <!-- see reference for [EUC-KR] in RFC1557 -->
-         <a href=#references>[WIN949]</a><!-- http://www.microsoft.com/globaldev/reference/dbcs/949.mspx -->
-    <tr><td> GB2312 <td> GBK <td>
-         <a href=#references>[GB2312]</a><!-- XXX ? -->
-         <a href=#references>[GBK]</a><!-- http://www.iana.org/assignments/charset-reg/GBK -->
-    <tr><td> GB_2312-80 <td> GBK <td>
-         <a href=#references>[RFC1345]</a><!-- XXX consider more direct reference? -->
-         <a href=#references>[GBK]</a><!-- http://www.iana.org/assignments/charset-reg/GBK -->
-    <tr><td> ISO-8859-1 <td> Windows-1252 <td>
-         <a href=#references>[RFC1345]</a><!-- XXX consider more direct reference? -->
-         <a href=#references>[WIN1252]</a><!-- http://www.microsoft.com/globaldev/reference/sbcs/1252.htm -->
-    <tr><td> ISO-8859-9 <td> Windows-1254 <td>
-         <a href=#references>[RFC1345]</a><!-- XXX consider more direct reference? -->
-         <a href=#references>[WIN1254]</a><!-- http://www.microsoft.com/globaldev/reference/sbcs/1254.htm -->
-    <tr><td> ISO-8859-11 <td> Windows-874 <td>
-         <a href=#references>[ISO885911]</a><!-- get reference from http://www.iso.org/iso/iso_catalogue/catalogue_tc/catalogue_detail.htm?csnumber=28263 -->
-         <a href=#references>[WIN874]</a><!-- http://www.microsoft.com/globaldev/reference/sbcs/874.mspx -->
-    <tr><td> KS_C_5601-1987 <td> Windows-949 <td>
-         <a href=#references>[RFC1345]</a><!-- XXX consider more direct reference? -->
-         <a href=#references>[WIN949]</a><!-- http://www.microsoft.com/globaldev/reference/dbcs/949.mspx -->
-    <tr><td> TIS-620 <td> Windows-874 <td>
-         <a href=#references>[TIS620]</a> <!-- http://www.nectec.or.th/it-standards/std620/std620.htm -->
-         <a href=#references>[WIN874]</a><!-- http://www.microsoft.com/globaldev/reference/sbcs/874.mspx -->
-    <tr><td> US-ASCII <td> Windows-1252 <td>
-         <a href=#references>[RFC1345]</a><!-- XXX consider more direct reference? -->
-         <a href=#references>[WIN1252]</a><!-- http://www.microsoft.com/globaldev/reference/sbcs/1252.htm -->
-    <tr><td> x-x-big5 <td> Big5 <td>
-         <a href=#references>[BIG5]</a> <!-- XXX ? -->
-   </table><p class=note>The requirement to treat certain encodings as other
-  encodings according to the table above is a willful violation of the
-  W3C Character Model specification. <a href=#references>[CHARMOD]</a><p>User agents must not support the CESU-8, UTF-7, BOCU-1 and SCSU
-  encodings. <a href=#references>[CESU8]</a> <a href=#references>[UTF7]</a> <a href=#references>[BOCU1]</a> <a href=#references>[SCSU]</a><p>Support for encodings based on EBCDIC is not recommended. This
-  encoding is rarely used for publicly-facing Web content.<p>Support for UTF-32 is not recommended. This encoding is rarely
-  used, and frequently misimplemented.<p class=note>This specification does not make any attempt to
-  support EBCDIC-based encodings and UTF-32 in its algorithms; support
-  and use of these encodings can thus lead to unexpected behavior in
-  implementations of this specification.<h5 id=preprocessing-the-input-stream><span class=secno>8.2.2.3 </span>Preprocessing the input stream</h5><p>Given an encoding, the bytes in the input stream must be
+  use for the input stream.<h5 id=preprocessing-the-input-stream><span class=secno>8.2.2.2 </span>Preprocessing the input stream</h5><p>Given an encoding, the bytes in the input stream must be
   converted to Unicode characters for the tokeniser, as described by
   the rules for that encoding, except that the leading U+FEFF BYTE
   ORDER MARK character, if any, must not be stripped by the encoding
@@ -39622,7 +39620,9 @@
   U+FFFD REPLACEMENT CHARACTER code points.<p class=note>Bytes or sequences of bytes in the original byte
   stream that did not conform to the encoding specification
   (e.g. invalid UTF-8 byte sequences in a UTF-8 input stream) are
-  errors that conformance checkers are expected to report.<p>One leading U+FEFF BYTE ORDER MARK character must be ignored if
+  errors that conformance checkers are expected to report.<p>Any byte or sequences of bytes in the original byte stream that
+  is <a href=#misinterpreted-for-compatibility>misinterpreted for compatibility</a> is a <a href=#parse-error>parse
+  error</a>.<p>One leading U+FEFF BYTE ORDER MARK character must be ignored if
   any are present.<p>All U+0000 NULL characters in the input must be replaced by
   U+FFFD REPLACEMENT CHARACTERs. Any occurrences of such characters is
   a <a href=#parse-error>parse error</a>.<p>Any occurrences of any characters in the ranges U+0001 to U+0008,
@@ -39659,7 +39659,7 @@
   <a href=#the-input-stream>input stream</a> is reached when an <dfn id=explicit-eof-character>explicit "EOF"
   character</dfn> (inserted by the <code title=dom-document-close><a href=#dom-document-close>document.close()</a></code> method) is
   consumed. Otherwise, the "EOF" character is not a real character in
-  the stream, but rather the lack of any further characters.<h5 id=changing-the-encoding-while-parsing><span class=secno>8.2.2.4 </span>Changing the encoding while parsing</h5><p>When the parser requires the user agent to <dfn id=change-the-encoding>change the
+  the stream, but rather the lack of any further characters.<h5 id=changing-the-encoding-while-parsing><span class=secno>8.2.2.3 </span>Changing the encoding while parsing</h5><p>When the parser requires the user agent to <dfn id=change-the-encoding>change the
   encoding</dfn>, it must run the following steps. This might happen
   if the <a href=#encoding-sniffing-algorithm>encoding sniffing algorithm</a> described above
   failed to find an encoding, or if it found an encoding that was not

Received on Thursday, 19 February 2009 11:09:05 UTC