- From: Ian Hickson via cvs-syncmail <cvsmail@w3.org>
- Date: Wed, 09 Sep 2009 06:43:05 +0000
- To: public-html-commits@w3.org
Update of /sources/public/html5/spec In directory hutz:/tmp/cvs-serv30837 Modified Files: Overview.html Log Message: Move the character encoding stuff down to the HTML syntax section since we don't want to override XML here. (whatwg r3772) Index: Overview.html =================================================================== RCS file: /sources/public/html5/spec/Overview.html,v retrieving revision 1.2941 retrieving revision 1.2942 diff -u -d -r1.2941 -r1.2942 --- Overview.html 9 Sep 2009 05:27:43 -0000 1.2941 +++ Overview.html 9 Sep 2009 06:43:02 -0000 1.2942 @@ -363,24 +363,23 @@ <li><a href="#concept-http-equivalent"><span class="secno">2.6.1 </span>Protocol concepts</a></li> <li><a href="#encrypted-http-and-related-security-concerns"><span class="secno">2.6.2 </span>Encrypted HTTP and related security concerns</a></li> <li><a href="#content-type-sniffing"><span class="secno">2.6.3 </span>Determining the type of a resource</a></ol></li> - <li><a href="#character-encodings-0"><span class="secno">2.7 </span>Character encodings</a></li> - <li><a href="#common-dom-interfaces"><span class="secno">2.8 </span>Common DOM interfaces</a> + <li><a href="#common-dom-interfaces"><span class="secno">2.7 </span>Common DOM interfaces</a> <ol> - <li><a href="#reflecting-content-attributes-in-idl-attributes"><span class="secno">2.8.1 </span>Reflecting content attributes in IDL attributes</a></li> - <li><a href="#collections-0"><span class="secno">2.8.2 </span>Collections</a> + <li><a href="#reflecting-content-attributes-in-idl-attributes"><span class="secno">2.7.1 </span>Reflecting content attributes in IDL attributes</a></li> + <li><a href="#collections-0"><span class="secno">2.7.2 </span>Collections</a> <ol> - <li><a href="#htmlcollection-0"><span class="secno">2.8.2.1 </span>HTMLCollection</a></li> - <li><a href="#htmlallcollection-0"><span class="secno">2.8.2.2 </span>HTMLAllCollection</a></li> - <li><a href="#htmlformcontrolscollection-0"><span class="secno">2.8.2.3 </span>HTMLFormControlsCollection</a></li> - <li><a href="#htmloptionscollection-0"><span class="secno">2.8.2.4 </span>HTMLOptionsCollection</a></li> - <li><a href="#htmlpropertycollection-0"><span class="secno">2.8.2.5 </span>HTMLPropertyCollection</a></ol></li> - <li><a href="#domtokenlist-0"><span class="secno">2.8.3 </span>DOMTokenList</a></li> - <li><a href="#domsettabletokenlist-0"><span class="secno">2.8.4 </span>DOMSettableTokenList</a></li> - <li><a href="#safe-passing-of-structured-data"><span class="secno">2.8.5 </span>Safe passing of structured data</a></li> - <li><a href="#domstringmap-0"><span class="secno">2.8.6 </span>DOMStringMap</a></li> - <li><a href="#dom-feature-strings"><span class="secno">2.8.7 </span>DOM feature strings</a></li> - <li><a href="#exceptions"><span class="secno">2.8.8 </span>Exceptions</a></li> - <li><a href="#garbage-collection"><span class="secno">2.8.9 </span>Garbage collection</a></ol></ol></li> + <li><a href="#htmlcollection-0"><span class="secno">2.7.2.1 </span>HTMLCollection</a></li> + <li><a href="#htmlallcollection-0"><span class="secno">2.7.2.2 </span>HTMLAllCollection</a></li> + <li><a href="#htmlformcontrolscollection-0"><span class="secno">2.7.2.3 </span>HTMLFormControlsCollection</a></li> + <li><a href="#htmloptionscollection-0"><span class="secno">2.7.2.4 </span>HTMLOptionsCollection</a></li> + <li><a href="#htmlpropertycollection-0"><span class="secno">2.7.2.5 </span>HTMLPropertyCollection</a></ol></li> + <li><a href="#domtokenlist-0"><span class="secno">2.7.3 </span>DOMTokenList</a></li> + <li><a href="#domsettabletokenlist-0"><span class="secno">2.7.4 </span>DOMSettableTokenList</a></li> + <li><a href="#safe-passing-of-structured-data"><span class="secno">2.7.5 </span>Safe passing of structured data</a></li> + <li><a href="#domstringmap-0"><span class="secno">2.7.6 </span>DOMStringMap</a></li> + <li><a href="#dom-feature-strings"><span class="secno">2.7.7 </span>DOM feature strings</a></li> + <li><a href="#exceptions"><span class="secno">2.7.8 </span>Exceptions</a></li> + <li><a href="#garbage-collection"><span class="secno">2.7.9 </span>Garbage collection</a></ol></ol></li> <li><a href="#dom"><span class="secno">3 </span>Semantics, structure, and APIs of HTML documents</a> <ol> <li><a href="#documents"><span class="secno">3.1 </span>Documents</a> @@ -991,8 +990,9 @@ <li><a href="#the-input-stream"><span class="secno">9.2.2 </span>The input stream</a> <ol> <li><a href="#determining-the-character-encoding"><span class="secno">9.2.2.1 </span>Determining the character encoding</a></li> - <li><a href="#preprocessing-the-input-stream"><span class="secno">9.2.2.2 </span>Preprocessing the input stream</a></li> - <li><a href="#changing-the-encoding-while-parsing"><span class="secno">9.2.2.3 </span>Changing the encoding while parsing</a></ol></li> + <li><a href="#character-encodings-0"><span class="secno">9.2.2.2 </span>Character encodings</a></li> + <li><a href="#preprocessing-the-input-stream"><span class="secno">9.2.2.3 </span>Preprocessing the input stream</a></li> + <li><a href="#changing-the-encoding-while-parsing"><span class="secno">9.2.2.4 </span>Changing the encoding while parsing</a></ol></li> <li><a href="#parse-state"><span class="secno">9.2.3 </span>Parse state</a> <ol> <li><a href="#the-insertion-mode"><span class="secno">9.2.3.1 </span>The insertion mode</a></li> @@ -4646,111 +4646,7 @@ occur. For more details, see the Content-Type Processing Model specification. <a href="#refsMIMESNIFF">[MIMESNIFF]</a></p> - </div><div class="impl"> - - <h3 id="character-encodings-0"><span class="secno">2.7 </span>Character encodings</h3><p class="XXX annotation"><b>Status: </b><i>Working draft</i></p> - - <p>User agents must at a minimum support the UTF-8 and Windows-1252 - encodings, but may support more.</p> - - <p class="note">It is not unusual for Web browsers to support dozens - if not upwards of a hundred distinct character encodings.</p> - - <p>User agents must support the preferred MIME name of every - character encoding they support that has a preferred MIME name, and - should support all the IANA-registered aliases of every character - encoding they support. <a href="#refsIANACHARSET">[IANACHARSET]</a></p> - - <p>When comparing a string specifying a character encoding with the - name or alias of a character encoding to determine if they are - equal, user agents must remove any leading or trailing <a href="#space-character" title="space character">space characters</a> in both names, and - then perform the comparison in an <a href="#ascii-case-insensitive">ASCII - case-insensitive</a> manner.</p> - -<!-- this bit will be replaced by actual alias registrations in due course --> - - <p>In addition, user agents must support the aliases given in the - following table for every character encoding they support, so that - labels from the first column are treated as equivalent to the labels - given in the corresponding cell from the second column on the same - row.</p> - - <table><caption>Additional character encoding aliases</caption> - <thead><tr><th> Alias <th> Corresponding encoding <th> References - <tbody><tr><td> x-sjis <td> windows-31J <td> - <a href="#refsSHIFTJIS">[SHIFTJIS]</a> - <a href="#refsWIN31J">[WIN31J]</a> - <tr><td> windows-932 <td> windows-31J <td> - <a href="#refsWIN31J">[WIN31J]</a> - <tr><td> x-x-big5 <td> Big5 <td> - <a href="#refsBIG5">[BIG5]</a> - </table><!-- end of bit that will be replaced by actual alias registrations in due course --><hr><p>When a user agent would otherwise use an encoding given in the - first column of the following table to either convert content to - Unicode characters or convert Unicode characters to bytes, it must - instead use the encoding given in the cell in the second column of - the same row. When a byte or sequence of bytes is treated - differently due to this encoding aliasing, it is said to have been - <dfn id="misinterpreted-for-compatibility">misinterpreted for compatibility</dfn>.</p> - - <table><caption>Character encoding overrides</caption> - <thead><tr><th> Input encoding <th> Replacement encoding <th> References - <tbody><!-- how about EUC-JP? --><tr><td> EUC-KR <td> windows-949 <td> - <a href="#refsEUCKR">[EUCKR]</a> - <a href="#refsWIN949">[WIN949]</a> - <tr><td> GB2312 <td> GBK <td> - <a href="#refsRFC1345">[RFC1345]</a> - <a href="#refsGBK">[GBK]</a> - <tr><td> GB_2312-80 <td> GBK <td> - <a href="#refsRFC1345">[RFC1345]</a> - <a href="#refsGBK">[GBK]</a> - <tr><td> ISO-8859-1 <td> windows-1252 <td> - <a href="#refsRFC1345">[RFC1345]</a> - <a href="#refsWIN1252">[WIN1252]</a> - <tr><td> ISO-8859-9 <td> windows-1254 <td> - <a href="#refsRFC1345">[RFC1345]</a> - <a href="#refsWIN1254">[WIN1254]</a> - <tr><td> ISO-8859-11 <td> windows-874 <td> - <a href="#refsISO885911">[ISO885911]</a> - <a href="#refsWIN874">[WIN874]</a> - <tr><td> KS_C_5601-1987 <td> windows-949 <td> - <a href="#refsRFC1345">[RFC1345]</a> - <a href="#refsWIN949">[WIN949]</a> - <tr><td> Shift_JIS <td> windows-31J <td> - <a href="#refsSHIFTJIS">[SHIFTJIS]</a> - <a href="#refsWIN31J">[WIN31J]</a> - <tr><td> TIS-620 <td> windows-874 <td> - <a href="#refsTIS620">[TIS620]</a> - <a href="#refsWIN874">[WIN874]</a> - <tr><td> US-ASCII <td> windows-1252 <td> - <a href="#refsRFC1345">[RFC1345]</a> - <a href="#refsWIN1252">[WIN1252]</a> - </table><p class="note">The requirement to treat certain encodings as other - encodings according to the table above is a <a href="#willful-violation">willful - violation</a> of the W3C Character Model specification, motivated - by a desire for compatibility with legacy content. <a href="#refsCHARMOD">[CHARMOD]</a></p> - - <p>When a user agent is to use the UTF-16 encoding but no BOM has - been found, user agents must default to UTF-16LE.</p> - - <p class="note">The requirement to default UTF-16 to LE rather than - BE is a <a href="#willful-violation">willful violation</a> of RFC 2781, motivated by a - desire for compatibility with legacy content. <a href="#refsCHARMOD">[CHARMOD]</a></p> - - <hr><p>User agents must not support the CESU-8, UTF-7, BOCU-1 and SCSU - encodings. <a href="#refsCESU8">[CESU8]</a> <a href="#refsUTF7">[UTF7]</a> <a href="#refsBOCU1">[BOCU1]</a> <a href="#refsSCSU">[SCSU]</a></p> - - <p>Support for encodings based on EBCDIC is not recommended. This - encoding is rarely used for publicly-facing Web content.</p> - - <p>Support for UTF-32 is not recommended. This encoding is rarely - used, and frequently implemented incorrectly.</p> - - <p class="note">This specification does not make any attempt to - support EBCDIC-based encodings and UTF-32 in its algorithms; support - and use of these encodings can thus lead to unexpected behavior in - implementations of this specification.</p> - - </div><h3 id="common-dom-interfaces"><span class="secno">2.8 </span>Common DOM interfaces</h3><p class="XXX annotation"><b>Status: </b><i>Working draft</i><h4 id="reflecting-content-attributes-in-idl-attributes"><span class="secno">2.8.1 </span>Reflecting content attributes in IDL attributes</h4><p>Some <span title="IDL attribute">IDL attributes</span> are + </div><h3 id="common-dom-interfaces"><span class="secno">2.7 </span>Common DOM interfaces</h3><p class="XXX annotation"><b>Status: </b><i>Working draft</i><h4 id="reflecting-content-attributes-in-idl-attributes"><span class="secno">2.7.1 </span>Reflecting content attributes in IDL attributes</h4><p>Some <span title="IDL attribute">IDL attributes</span> are defined to <dfn id="reflect">reflect</dfn> a particular <span>content attribute</span>. This means that on getting, the IDL attribute returns the current value of the content attribute, and on setting, @@ -4921,7 +4817,7 @@ attribute. Otherwise, the IDL attribute must be set to the empty string.</p> - </div><h4 id="collections-0"><span class="secno">2.8.2 </span>Collections</h4><p>The <code><a href="#htmlcollection">HTMLCollection</a></code>, <code><a href="#htmlallcollection">HTMLAllCollection</a></code>, + </div><h4 id="collections-0"><span class="secno">2.7.2 </span>Collections</h4><p>The <code><a href="#htmlcollection">HTMLCollection</a></code>, <code><a href="#htmlallcollection">HTMLAllCollection</a></code>, <code><a href="#htmlformcontrolscollection">HTMLFormControlsCollection</a></code>, <code><a href="#htmloptionscollection">HTMLOptionsCollection</a></code>, and <code><a href="#htmlpropertycollection">HTMLPropertyCollection</a></code> interfaces represent various @@ -4944,7 +4840,7 @@ <p>An attribute that returns a collection must return the same object every time it is retrieved.</p> - </div><h5 id="htmlcollection-0"><span class="secno">2.8.2.1 </span>HTMLCollection</h5><p>The <code><a href="#htmlcollection">HTMLCollection</a></code> interface represents a generic + </div><h5 id="htmlcollection-0"><span class="secno">2.7.2.1 </span>HTMLCollection</h5><p>The <code><a href="#htmlcollection">HTMLCollection</a></code> interface represents a generic <a href="#collections" title="collections">collection</a> of elements.<pre class="idl">interface <dfn id="htmlcollection">HTMLCollection</dfn> { readonly attribute unsigned long <a href="#dom-htmlcollection-length" title="dom-HTMLCollection-length">length</a>; caller getter Element <a href="#dom-htmlcollection-item" title="dom-HTMLCollection-item">item</a>(in unsigned long index); @@ -5030,7 +4926,7 @@ the method was invoked. In <a href="#html-documents">HTML documents</a>, the argument must first be <a href="#converted-to-ascii-lowercase">converted to ASCII lowercase</a>.</p> - </div><h5 id="htmlallcollection-0"><span class="secno">2.8.2.2 </span>HTMLAllCollection</h5><p>The <code><a href="#htmlallcollection">HTMLAllCollection</a></code> interface represents a generic + </div><h5 id="htmlallcollection-0"><span class="secno">2.7.2.2 </span>HTMLAllCollection</h5><p>The <code><a href="#htmlallcollection">HTMLAllCollection</a></code> interface represents a generic <a href="#collections" title="collections">collection</a> of elements just like <code><a href="#htmlcollection">HTMLCollection</a></code>, with the exception that its <code title="dom-HTMLAllCollection-namedItem"><a href="#dom-htmlallcollection-nameditem">namedItem()</a></code> method returns an <code><a href="#htmlcollection">HTMLCollection</a></code> object when there are @@ -5138,7 +5034,7 @@ documents</a>, the argument must first be <a href="#converted-to-ascii-lowercase">converted to ASCII lowercase</a>.</p> - </div><h5 id="htmlformcontrolscollection-0"><span class="secno">2.8.2.3 </span>HTMLFormControlsCollection</h5><p>The <code><a href="#htmlformcontrolscollection">HTMLFormControlsCollection</a></code> interface represents + </div><h5 id="htmlformcontrolscollection-0"><span class="secno">2.7.2.3 </span>HTMLFormControlsCollection</h5><p>The <code><a href="#htmlformcontrolscollection">HTMLFormControlsCollection</a></code> interface represents a <a href="#collections" title="collections">collection</a> of <a href="#category-listed" title="category-listed">listed</a> elements in <code><a href="#the-form-element">form</a></code> and <code><a href="#the-fieldset-element">fieldset</a></code> elements.<pre class="idl">interface <dfn id="htmlformcontrolscollection">HTMLFormControlsCollection</dfn> { readonly attribute unsigned long <a href="#dom-htmlformcontrolscollection-length" title="dom-HTMLFormControlsCollection-length">length</a>; @@ -5262,7 +5158,7 @@ </ol><!-- http://software.hixie.ch/utilities/js/live-dom-viewer/?%3C%21DOCTYPE%20html%3E...%0A%3Cform%20name%3D%22a%22%3E%3Cinput%20id%3D%22x%22%20name%3D%22y%22%3E%3Cinput%20name%3D%22x%22%20id%3D%22y%22%3E%3C/form%3E%0A%3Cscript%3E%0A%20%20var%20x%3B%0A%20%20w%28x%20%3D%20document.forms%5B%27a%27%5D%5B%27x%27%5D%29%3B%0A%20%20w%28x.length%29%3B%0A%20%20x%5B0%5D.parentNode.removeChild%28x%5B0%5D%29%3B%0A%20%20w%28x.length%29%3B%0A%20%20w%28x%20%3D%3D%20document.forms%5B%27a%27%5D%5B%27x%27%5D%29%3B%0A%3C/script%3E%0A ---></div><h5 id="htmloptionscollection-0"><span class="secno">2.8.2.4 </span>HTMLOptionsCollection</h5><p>The <code><a href="#htmloptionscollection">HTMLOptionsCollection</a></code> interface represents a +--></div><h5 id="htmloptionscollection-0"><span class="secno">2.7.2.4 </span>HTMLOptionsCollection</h5><p>The <code><a href="#htmloptionscollection">HTMLOptionsCollection</a></code> interface represents a list of <code><a href="#the-option-element">option</a></code> elements. It is always rooted on a <code><a href="#the-select-element">select</a></code> element and has attributes and methods that manipulate that element's descendants.<pre class="idl">interface <dfn id="htmloptionscollection">HTMLOptionsCollection</dfn> { @@ -5416,7 +5312,7 @@ <li><p>Remove <var title="">element</var> from its parent node.</li> - </ol><!-- see also http://ln.hixie.ch/?start=1161042744&count=1 --></div><h5 id="htmlpropertycollection-0"><span class="secno">2.8.2.5 </span>HTMLPropertyCollection</h5><p>The <code><a href="#htmlpropertycollection">HTMLPropertyCollection</a></code> interface represents a + </ol><!-- see also http://ln.hixie.ch/?start=1161042744&count=1 --></div><h5 id="htmlpropertycollection-0"><span class="secno">2.7.2.5 </span>HTMLPropertyCollection</h5><p>The <code><a href="#htmlpropertycollection">HTMLPropertyCollection</a></code> interface represents a <a href="#collections" title="collections">collection</a> of elements that add name-value pairs to a particular <a href="#concept-item" title="concept-item">item</a> in the <a href="#microdata">microdata</a> model.<pre class="idl">interface <dfn id="htmlpropertycollection">HTMLPropertyCollection</dfn> { @@ -5508,7 +5404,7 @@ DOM property of each of the elements represented by the object, in <a href="#tree-order">tree order</a>.</p> - </div><h4 id="domtokenlist-0"><span class="secno">2.8.3 </span>DOMTokenList</h4><p>The <code><a href="#domtokenlist">DOMTokenList</a></code> interface represents an interface + </div><h4 id="domtokenlist-0"><span class="secno">2.7.3 </span>DOMTokenList</h4><p>The <code><a href="#domtokenlist">DOMTokenList</a></code> interface represents an interface to an underlying string that consists of a <a href="#set-of-space-separated-tokens">set of space-separated tokens</a>.<p class="note"><code><a href="#domtokenlist">DOMTokenList</a></code> objects are always <a href="#case-sensitive">case-sensitive</a>, even when the underlying string might @@ -5680,7 +5576,7 @@ <dfn id="dom-tokenlist-tostring" title="dom-tokenlist-toString">stringify</dfn> to the object's underlying string representation.</p> - </div><h4 id="domsettabletokenlist-0"><span class="secno">2.8.4 </span>DOMSettableTokenList</h4><p>The <code><a href="#domsettabletokenlist">DOMSettableTokenList</a></code> interface is the same as the + </div><h4 id="domsettabletokenlist-0"><span class="secno">2.7.4 </span>DOMSettableTokenList</h4><p>The <code><a href="#domsettabletokenlist">DOMSettableTokenList</a></code> interface is the same as the <code><a href="#domtokenlist">DOMTokenList</a></code> interface, except that it allows the underlying string to be directly changed.<pre class="idl">interface <dfn id="domsettabletokenlist">DOMSettableTokenList</dfn> : <a href="#domtokenlist">DOMTokenList</a> { attribute DOMString <a href="#dom-domsettabletokenlist-value" title="dom-DOMSettableTokenList-value">value</a>; @@ -5703,7 +5599,7 @@ </div><div class="impl"> - <h4 id="safe-passing-of-structured-data"><span class="secno">2.8.5 </span>Safe passing of structured data</h4> + <h4 id="safe-passing-of-structured-data"><span class="secno">2.7.5 </span>Safe passing of structured data</h4> <p>When a user agent is required to obtain a <dfn id="structured-clone">structured clone</dfn> of an object, it must run the following algorithm, which @@ -5827,7 +5723,7 @@ <dd><p>Return the null value.</dd> - </dl></div><h4 id="domstringmap-0"><span class="secno">2.8.6 </span>DOMStringMap</h4><p>The <code><a href="#domstringmap">DOMStringMap</a></code> interface represents a set of + </dl></div><h4 id="domstringmap-0"><span class="secno">2.7.6 </span>DOMStringMap</h4><p>The <code><a href="#domstringmap">DOMStringMap</a></code> interface represents a set of name-value pairs. It exposes these using the scripting language's native mechanisms for property access.<div class="impl"> @@ -5901,7 +5797,7 @@ } }</pre> - </div><h4 id="dom-feature-strings"><span class="secno">2.8.7 </span>DOM feature strings</h4><p>DOM3 Core defines mechanisms for checking for interface support, + </div><h4 id="dom-feature-strings"><span class="secno">2.7.7 </span>DOM feature strings</h4><p>DOM3 Core defines mechanisms for checking for interface support, and for obtaining implementations of interfaces, using <a href="http://www.w3.org/TR/DOM-Level-3-Core/core.html#DOMFeatures">feature strings</a>. <a href="#refsDOMCORE">[DOMCORE]</a><p>Authors are strongly discouraged from using these, as they are notoriously unreliable and imprecise. Authors are encouraged to rely @@ -5914,7 +5810,7 @@ with <var title="">feature</var> set to either "<code title="">HTML</code>" or "<code title="">XHTML</code>" and <var title="">version</var> set to either "<code>1.0</code>" or "<code>2.0</code>".</p> - </div><h4 id="exceptions"><span class="secno">2.8.8 </span>Exceptions</h4><p>The following <code>DOMException</code> codes are defined in DOM + </div><h4 id="exceptions"><span class="secno">2.7.8 </span>Exceptions</h4><p>The following <code>DOMException</code> codes are defined in DOM Core. <a href="#refsDOMCORE">[DOMCORE]</a><ol class="brief"><li value="1"><dfn id="index_size_err"><code>INDEX_SIZE_ERR</code></dfn></li> <li value="2"><dfn id="domstring_size_err"><code>DOMSTRING_SIZE_ERR</code></dfn></li> <li value="3"><dfn id="hierarchy_request_err"><code>HIERARCHY_REQUEST_ERR</code></dfn></li> @@ -5942,7 +5838,7 @@ <li value="82"><dfn id="serialize_err"><code>SERIALIZE_ERR</code></dfn></li> <!-- actually defined in dom3ls --> </ol><div class="impl"> - <h4 id="garbage-collection"><span class="secno">2.8.9 </span>Garbage collection</h4> + <h4 id="garbage-collection"><span class="secno">2.7.9 </span>Garbage collection</h4> <p>There is an <dfn id="implied-strong-reference">implied strong reference</dfn> from any IDL attribute that returns a pre-existing object to that object.</p> @@ -54664,8 +54560,111 @@ use for the input stream.</p> + <h5 id="character-encodings-0"><span class="secno">9.2.2.2 </span>Character encodings</h5><p class="XXX annotation"><b>Status: </b><i>Working draft</i></p> - <h5 id="preprocessing-the-input-stream"><span class="secno">9.2.2.2 </span>Preprocessing the input stream</h5> + <p>User agents must at a minimum support the UTF-8 and Windows-1252 + encodings, but may support more.</p> + + <p class="note">It is not unusual for Web browsers to support dozens + if not upwards of a hundred distinct character encodings.</p> + + <p>User agents must support the preferred MIME name of every + character encoding they support that has a preferred MIME name, and + should support all the IANA-registered aliases of every character + encoding they support. <a href="#refsIANACHARSET">[IANACHARSET]</a></p> + + <p>When comparing a string specifying a character encoding with the + name or alias of a character encoding to determine if they are + equal, user agents must remove any leading or trailing <a href="#space-character" title="space character">space characters</a> in both names, and + then perform the comparison in an <a href="#ascii-case-insensitive">ASCII + case-insensitive</a> manner.</p> + +<!-- this bit will be replaced by actual alias registrations in due course --> + + <p>In addition, user agents must support the aliases given in the + following table for every character encoding they support, so that + labels from the first column are treated as equivalent to the labels + given in the corresponding cell from the second column on the same + row.</p> + + <table><caption>Additional character encoding aliases</caption> + <thead><tr><th> Alias <th> Corresponding encoding <th> References + <tbody><tr><td> x-sjis <td> windows-31J <td> + <a href="#refsSHIFTJIS">[SHIFTJIS]</a> + <a href="#refsWIN31J">[WIN31J]</a> + <tr><td> windows-932 <td> windows-31J <td> + <a href="#refsWIN31J">[WIN31J]</a> + <tr><td> x-x-big5 <td> Big5 <td> + <a href="#refsBIG5">[BIG5]</a> + </table><!-- end of bit that will be replaced by actual alias registrations in due course --><hr><p>When a user agent would otherwise use an encoding given in the + first column of the following table to either convert content to + Unicode characters or convert Unicode characters to bytes, it must + instead use the encoding given in the cell in the second column of + the same row. When a byte or sequence of bytes is treated + differently due to this encoding aliasing, it is said to have been + <dfn id="misinterpreted-for-compatibility">misinterpreted for compatibility</dfn>.</p> + + <table><caption>Character encoding overrides</caption> + <thead><tr><th> Input encoding <th> Replacement encoding <th> References + <tbody><!-- how about EUC-JP? --><tr><td> EUC-KR <td> windows-949 <td> + <a href="#refsEUCKR">[EUCKR]</a> + <a href="#refsWIN949">[WIN949]</a> + <tr><td> GB2312 <td> GBK <td> + <a href="#refsRFC1345">[RFC1345]</a> + <a href="#refsGBK">[GBK]</a> + <tr><td> GB_2312-80 <td> GBK <td> + <a href="#refsRFC1345">[RFC1345]</a> + <a href="#refsGBK">[GBK]</a> + <tr><td> ISO-8859-1 <td> windows-1252 <td> + <a href="#refsRFC1345">[RFC1345]</a> + <a href="#refsWIN1252">[WIN1252]</a> + <tr><td> ISO-8859-9 <td> windows-1254 <td> + <a href="#refsRFC1345">[RFC1345]</a> + <a href="#refsWIN1254">[WIN1254]</a> + <tr><td> ISO-8859-11 <td> windows-874 <td> + <a href="#refsISO885911">[ISO885911]</a> + <a href="#refsWIN874">[WIN874]</a> + <tr><td> KS_C_5601-1987 <td> windows-949 <td> + <a href="#refsRFC1345">[RFC1345]</a> + <a href="#refsWIN949">[WIN949]</a> + <tr><td> Shift_JIS <td> windows-31J <td> + <a href="#refsSHIFTJIS">[SHIFTJIS]</a> + <a href="#refsWIN31J">[WIN31J]</a> + <tr><td> TIS-620 <td> windows-874 <td> + <a href="#refsTIS620">[TIS620]</a> + <a href="#refsWIN874">[WIN874]</a> + <tr><td> US-ASCII <td> windows-1252 <td> + <a href="#refsRFC1345">[RFC1345]</a> + <a href="#refsWIN1252">[WIN1252]</a> + </table><p class="note">The requirement to treat certain encodings as other + encodings according to the table above is a <a href="#willful-violation">willful + violation</a> of the W3C Character Model specification, motivated + by a desire for compatibility with legacy content. <a href="#refsCHARMOD">[CHARMOD]</a></p> + + <p>When a user agent is to use the UTF-16 encoding but no BOM has + been found, user agents must default to UTF-16LE.</p> + + <p class="note">The requirement to default UTF-16 to LE rather than + BE is a <a href="#willful-violation">willful violation</a> of RFC 2781, motivated by a + desire for compatibility with legacy content. <a href="#refsCHARMOD">[CHARMOD]</a></p> + + <hr><p>User agents must not support the CESU-8, UTF-7, BOCU-1 and SCSU + encodings. <a href="#refsCESU8">[CESU8]</a> <a href="#refsUTF7">[UTF7]</a> <a href="#refsBOCU1">[BOCU1]</a> <a href="#refsSCSU">[SCSU]</a></p> + + <p>Support for encodings based on EBCDIC is not recommended. This + encoding is rarely used for publicly-facing Web content.</p> + + <p>Support for UTF-32 is not recommended. This encoding is rarely + used, and frequently implemented incorrectly.</p> + + <p class="note">This specification does not make any attempt to + support EBCDIC-based encodings and UTF-32 in its algorithms; support + and use of these encodings can thus lead to unexpected behavior in + implementations of this specification.</p> + + + + <h5 id="preprocessing-the-input-stream"><span class="secno">9.2.2.3 </span>Preprocessing the input stream</h5> <p>Given an encoding, the bytes in the input stream must be converted to Unicode characters for the tokenizer, as described by @@ -54740,7 +54739,7 @@ the stream, but rather the lack of any further characters.</p> - <h5 id="changing-the-encoding-while-parsing"><span class="secno">9.2.2.3 </span>Changing the encoding while parsing</h5> + <h5 id="changing-the-encoding-while-parsing"><span class="secno">9.2.2.4 </span>Changing the encoding while parsing</h5> <p>When the parser requires the user agent to <dfn id="change-the-encoding">change the encoding</dfn>, it must run the following steps. This might happen
Received on Wednesday, 9 September 2009 06:43:16 UTC