- From: Ian Hickson via cvs-syncmail <cvsmail@w3.org>
- Date: Thu, 19 Feb 2009 11:05:02 +0000
- To: public-html-commits@w3.org
Update of /sources/public/html5/spec In directory hutz:/tmp/cvs-serv10049 Modified Files: Overview.html Log Message: Abstract out the encoding stuff from the parser to the infrastructure section so that it also affects form submission (whatwg r2842) Index: Overview.html =================================================================== RCS file: /sources/public/html5/spec/Overview.html,v retrieving revision 1.2012 retrieving revision 1.2013 diff -u -d -r1.2012 -r1.2013 --- Overview.html 19 Feb 2009 10:20:10 -0000 1.2012 +++ Overview.html 19 Feb 2009 11:04:59 -0000 1.2013 @@ -201,20 +201,21 @@ <li><a href=#content-type-sniffing:-unknown-type><span class=secno>2.7.4 </span>Content-Type sniffing: unknown type</a></li> <li><a href=#content-type-sniffing:-image><span class=secno>2.7.5 </span>Content-Type sniffing: image</a></li> <li><a href=#content-type-sniffing:-feed-or-html><span class=secno>2.7.6 </span>Content-Type sniffing: feed or HTML</a></ol></li> - <li><a href=#common-dom-interfaces><span class=secno>2.8 </span>Common DOM interfaces</a> + <li><a href=#character-encodings-0><span class=secno>2.8 </span>Character encodings</a></li> + <li><a href=#common-dom-interfaces><span class=secno>2.9 </span>Common DOM interfaces</a> <ol> - <li><a href=#reflecting-content-attributes-in-dom-attributes><span class=secno>2.8.1 </span>Reflecting content attributes in DOM attributes</a></li> - <li><a href=#collections><span class=secno>2.8.2 </span>Collections</a> + <li><a href=#reflecting-content-attributes-in-dom-attributes><span class=secno>2.9.1 </span>Reflecting content attributes in DOM attributes</a></li> + <li><a href=#collections><span class=secno>2.9.2 </span>Collections</a> <ol> - <li><a href=#htmlcollection><span class=secno>2.8.2.1 </span>HTMLCollection</a></li> - <li><a href=#htmlformcontrolscollection><span class=secno>2.8.2.2 </span>HTMLFormControlsCollection</a></li> - <li><a href=#htmloptionscollection><span class=secno>2.8.2.3 </span>HTMLOptionsCollection</a></ol></li> - <li><a href=#domtokenlist><span class=secno>2.8.3 </span>DOMTokenList</a></li> - <li><a href=#safe-passing-of-structured-data><span class=secno>2.8.4 </span>Safe passing of structured data</a></li> - <li><a href=#domstringmap><span class=secno>2.8.5 </span>DOMStringMap</a></li> - <li><a href=#dom-feature-strings><span class=secno>2.8.6 </span>DOM feature strings</a></li> - <li><a href=#exceptions><span class=secno>2.8.7 </span>Exceptions</a></li> - <li><a href=#garbage-collection><span class=secno>2.8.8 </span>Garbage collection</a></ol></ol></li> + <li><a href=#htmlcollection><span class=secno>2.9.2.1 </span>HTMLCollection</a></li> + <li><a href=#htmlformcontrolscollection><span class=secno>2.9.2.2 </span>HTMLFormControlsCollection</a></li> + <li><a href=#htmloptionscollection><span class=secno>2.9.2.3 </span>HTMLOptionsCollection</a></ol></li> + <li><a href=#domtokenlist><span class=secno>2.9.3 </span>DOMTokenList</a></li> + <li><a href=#safe-passing-of-structured-data><span class=secno>2.9.4 </span>Safe passing of structured data</a></li> + <li><a href=#domstringmap><span class=secno>2.9.5 </span>DOMStringMap</a></li> + <li><a href=#dom-feature-strings><span class=secno>2.9.6 </span>DOM feature strings</a></li> + <li><a href=#exceptions><span class=secno>2.9.7 </span>Exceptions</a></li> + <li><a href=#garbage-collection><span class=secno>2.9.8 </span>Garbage collection</a></ol></ol></li> <li><a href=#dom><span class=secno>3 </span>Semantics and structure of HTML documents</a> <ol> <li><a href=#semantics-intro><span class=secno>3.1 </span>Introduction</a></li> @@ -864,9 +865,8 @@ <li><a href=#the-input-stream><span class=secno>8.2.2 </span>The input stream</a> <ol> <li><a href=#determining-the-character-encoding><span class=secno>8.2.2.1 </span>Determining the character encoding</a></li> - <li><a href=#character-encoding-requirements><span class=secno>8.2.2.2 </span>Character encoding requirements</a></li> - <li><a href=#preprocessing-the-input-stream><span class=secno>8.2.2.3 </span>Preprocessing the input stream</a></li> - <li><a href=#changing-the-encoding-while-parsing><span class=secno>8.2.2.4 </span>Changing the encoding while parsing</a></ol></li> + <li><a href=#preprocessing-the-input-stream><span class=secno>8.2.2.2 </span>Preprocessing the input stream</a></li> + <li><a href=#changing-the-encoding-while-parsing><span class=secno>8.2.2.3 </span>Changing the encoding while parsing</a></ol></li> <li><a href=#parse-state><span class=secno>8.2.3 </span>Parse state</a> <ol> <li><a href=#the-insertion-mode><span class=secno>8.2.3.1 </span>The insertion mode</a></li> @@ -4711,7 +4711,61 @@ </ol><p class=note>For efficiency reasons, implementations may wish to implement this algorithm and the algorithm for detecting the - character encoding of HTML documents in parallel.<h3 id=common-dom-interfaces><span class=secno>2.8 </span>Common DOM interfaces</h3><h4 id=reflecting-content-attributes-in-dom-attributes><span class=secno>2.8.1 </span>Reflecting content attributes in DOM attributes</h4><p>Some <span title="DOM attribute">DOM attributes</span> are + character encoding of HTML documents in parallel.<h3 id=character-encodings-0><span class=secno>2.8 </span>Character encodings</h3><p>User agents must at a minimum support the UTF-8 and Windows-1252 + encodings, but may support more.<p class=note>It is not unusual for Web browsers to support dozens + if not upwards of a hundred distinct character encodings.<p>User agents must support the preferred MIME name of every + character encoding they support that has a preferred MIME name, and + should support all the IANA-registered aliases. <a href=#references>[IANACHARSET]</a><p>When comparing a string specifying a character encoding with the + name or alias of a character encoding to determine if they are + equal, user agents must use the Charset Alias Matching rules defined + in Unicode Technical Standard #22. <a href=#references>[UTS22]</a></p><!-- XXXrefs + http://unicode.org/reports/tr22/#Charset_Alias_Matching --><p class=example>For instance, "GB_2312-80" and "g.b.2312(80)" are + considered equivalent names.</p><hr><p>When a user agent would otherwise use an encoding given in the + first column of the following table to either convert content to + Unicode characters or convert Unicode characters to bytes, it must + instead use the encoding given in the cell in the second column of + the same row. When a byte or sequence of bytes is treated + differently due to this encoding aliasing, it is said to have been + <dfn id=misinterpreted-for-compatibility>misinterpreted for compatibility</dfn>.<table><caption>Character encoding overrides</caption> + <thead><tr><th> Input encoding <th> Replacement encoding <th> References + <tbody><!-- how about EUC-JP? --><tr><td> EUC-KR <td> Windows-949 <td> + <a href=#references>[EUCKR]</a> <!-- see reference for [EUC-KR] in RFC1557 --> + <a href=#references>[WIN949]</a><!-- http://www.microsoft.com/globaldev/reference/dbcs/949.mspx --> + <tr><td> GB2312 <td> GBK <td> + <a href=#references>[GB2312]</a><!-- XXX ? --> + <a href=#references>[GBK]</a><!-- http://www.iana.org/assignments/charset-reg/GBK --> + <tr><td> GB_2312-80 <td> GBK <td> + <a href=#references>[RFC1345]</a><!-- XXX consider more direct reference? --> + <a href=#references>[GBK]</a><!-- http://www.iana.org/assignments/charset-reg/GBK --> + <tr><td> ISO-8859-1 <td> Windows-1252 <td> + <a href=#references>[RFC1345]</a><!-- XXX consider more direct reference? --> + <a href=#references>[WIN1252]</a><!-- http://www.microsoft.com/globaldev/reference/sbcs/1252.htm --> + <tr><td> ISO-8859-9 <td> Windows-1254 <td> + <a href=#references>[RFC1345]</a><!-- XXX consider more direct reference? --> + <a href=#references>[WIN1254]</a><!-- http://www.microsoft.com/globaldev/reference/sbcs/1254.htm --> + <tr><td> ISO-8859-11 <td> Windows-874 <td> + <a href=#references>[ISO885911]</a><!-- get reference from http://www.iso.org/iso/iso_catalogue/catalogue_tc/catalogue_detail.htm?csnumber=28263 --> + <a href=#references>[WIN874]</a><!-- http://www.microsoft.com/globaldev/reference/sbcs/874.mspx --> + <tr><td> KS_C_5601-1987 <td> Windows-949 <td> + <a href=#references>[RFC1345]</a><!-- XXX consider more direct reference? --> + <a href=#references>[WIN949]</a><!-- http://www.microsoft.com/globaldev/reference/dbcs/949.mspx --> + <tr><td> TIS-620 <td> Windows-874 <td> + <a href=#references>[TIS620]</a> <!-- http://www.nectec.or.th/it-standards/std620/std620.htm --> + <a href=#references>[WIN874]</a><!-- http://www.microsoft.com/globaldev/reference/sbcs/874.mspx --> + <tr><td> US-ASCII <td> Windows-1252 <td> + <a href=#references>[RFC1345]</a><!-- XXX consider more direct reference? --> + <a href=#references>[WIN1252]</a><!-- http://www.microsoft.com/globaldev/reference/sbcs/1252.htm --> + <tr><td> x-x-big5 <td> Big5 <td> + <a href=#references>[BIG5]</a> <!-- XXX ? --> + </table><p class=note>The requirement to treat certain encodings as other + encodings according to the table above is a willful violation of the + W3C Character Model specification. <a href=#references>[CHARMOD]</a></p><hr><p>User agents must not support the CESU-8, UTF-7, BOCU-1 and SCSU + encodings. <a href=#references>[CESU8]</a> <a href=#references>[UTF7]</a> <a href=#references>[BOCU1]</a> <a href=#references>[SCSU]</a><p>Support for encodings based on EBCDIC is not recommended. This + encoding is rarely used for publicly-facing Web content.<p>Support for UTF-32 is not recommended. This encoding is rarely + used, and frequently misimplemented.<p class=note>This specification does not make any attempt to + support EBCDIC-based encodings and UTF-32 in its algorithms; support + and use of these encodings can thus lead to unexpected behavior in + implementations of this specification.<h3 id=common-dom-interfaces><span class=secno>2.9 </span>Common DOM interfaces</h3><h4 id=reflecting-content-attributes-in-dom-attributes><span class=secno>2.9.1 </span>Reflecting content attributes in DOM attributes</h4><p>Some <span title="DOM attribute">DOM attributes</span> are defined to <dfn id=reflect>reflect</dfn> a particular <span>content attribute</span>. This means that on getting, the DOM attribute returns the current value of the content attribute, and on setting, @@ -4839,7 +4893,7 @@ </ol><p>On setting, if the given element has an <code title=attr-id><a href=#the-id-attribute>id</a></code> attribute, then the content attribute must be set to the value of that <code title=attr-id><a href=#the-id-attribute>id</a></code> attribute. Otherwise, the DOM attribute must be set to the empty - string.</p><!-- XXX or raise an exception? --><h4 id=collections><span class=secno>2.8.2 </span>Collections</h4><p>The <code><a href=#htmlcollection-0>HTMLCollection</a></code>, + string.</p><!-- XXX or raise an exception? --><h4 id=collections><span class=secno>2.9.2 </span>Collections</h4><p>The <code><a href=#htmlcollection-0>HTMLCollection</a></code>, <code><a href=#htmlformcontrolscollection-0>HTMLFormControlsCollection</a></code>, and <code><a href=#htmloptionscollection-0>HTMLOptionsCollection</a></code> interfaces represent various lists of DOM nodes. Collectively, objects implementing these @@ -4855,7 +4909,7 @@ nodes within the collection must be sorted in <a href=#tree-order>tree order</a>.<p class=note>The <code title=dom-table-rows><a href=#dom-table-rows>rows</a></code> list is not in tree order.<p>An attribute that returns a collection must return the same - object every time it is retrieved.<h5 id=htmlcollection><span class=secno>2.8.2.1 </span>HTMLCollection</h5><p>The <code><a href=#htmlcollection-0>HTMLCollection</a></code> interface represents a generic + object every time it is retrieved.<h5 id=htmlcollection><span class=secno>2.9.2.1 </span>HTMLCollection</h5><p>The <code><a href=#htmlcollection-0>HTMLCollection</a></code> interface represents a generic <a href=#collections-0 title=collections>collection</a> of elements.<pre class=idl>[Callable=<a href=#dom-htmlcollection-nameditem title=dom-HTMLCollection-namedItem>namedItem</a>] interface <dfn id=htmlcollection-0>HTMLCollection</dfn> { readonly attribute unsigned long <a href=#dom-htmlcollection-length title=dom-HTMLCollection-length>length</a>; @@ -4886,7 +4940,7 @@ <li>It is an element with an ID <var title="">key</var>.</li> </ul><p>If no such elements are found, then the method must return - null.<h5 id=htmlformcontrolscollection><span class=secno>2.8.2.2 </span>HTMLFormControlsCollection</h5><p>The <code><a href=#htmlformcontrolscollection-0>HTMLFormControlsCollection</a></code> interface represents + null.<h5 id=htmlformcontrolscollection><span class=secno>2.9.2.2 </span>HTMLFormControlsCollection</h5><p>The <code><a href=#htmlformcontrolscollection-0>HTMLFormControlsCollection</a></code> interface represents a <a href=#collections-0 title=collections>collection</a> of <a href=#category-listed title=category-listed>listed</a> elements in <code><a href=#the-form-element>form</a></code> and <code><a href=#the-fieldset-element>fieldset</a></code> elements.<pre class=idl>[Callable=<a href=#dom-htmlformcontrolscollection-nameditem title=dom-HTMLFormControlsCollection-namedItem>namedItem</a>] interface <dfn id=htmlformcontrolscollection-0>HTMLFormControlsCollection</dfn> { @@ -4923,7 +4977,7 @@ </ol><!-- http://software.hixie.ch/utilities/js/live-dom-viewer/?%3C%21DOCTYPE%20html%3E...%0A%3Cform%20name%3D%22a%22%3E%3Cinput%20id%3D%22x%22%20name%3D%22y%22%3E%3Cinput%20name%3D%22x%22%20id%3D%22y%22%3E%3C/form%3E%0A%3Cscript%3E%0A%20%20var%20x%3B%0A%20%20w%28x%20%3D%20document.forms%5B%27a%27%5D%5B%27x%27%5D%29%3B%0A%20%20w%28x.length%29%3B%0A%20%20x%5B0%5D.parentNode.removeChild%28x%5B0%5D%29%3B%0A%20%20w%28x.length%29%3B%0A%20%20w%28x%20%3D%3D%20document.forms%5B%27a%27%5D%5B%27x%27%5D%29%3B%0A%3C/script%3E%0A ---><h5 id=htmloptionscollection><span class=secno>2.8.2.3 </span>HTMLOptionsCollection</h5><p>The <code><a href=#htmloptionscollection-0>HTMLOptionsCollection</a></code> interface represents a +--><h5 id=htmloptionscollection><span class=secno>2.9.2.3 </span>HTMLOptionsCollection</h5><p>The <code><a href=#htmloptionscollection-0>HTMLOptionsCollection</a></code> interface represents a list of <code><a href=#the-option-element>option</a></code> elements. It is always rooted on a <code><a href=#the-select-element>select</a></code> element and has attributes and methods that manipulate that element's descendants.<pre class=idl>[Callable=<a href=#dom-htmloptionscollection-nameditem title=dom-HTMLOptionsCollection-namedItem>namedItem</a>] @@ -5019,7 +5073,7 @@ <li><p>Remove <var title="">element</var> from its parent node.</li> - </ol><!-- see also http://ln.hixie.ch/?start=1161042744&count=1 --><h4 id=domtokenlist><span class=secno>2.8.3 </span>DOMTokenList</h4><p>The <code><a href=#domtokenlist-0>DOMTokenList</a></code> interface represents an interface + </ol><!-- see also http://ln.hixie.ch/?start=1161042744&count=1 --><h4 id=domtokenlist><span class=secno>2.9.3 </span>DOMTokenList</h4><p>The <code><a href=#domtokenlist-0>DOMTokenList</a></code> interface represents an interface to an underlying string that consists of an <a href=#unordered-set-of-unique-space-separated-tokens>unordered set of unique space-separated tokens</a>.<p>Which string underlies a particular <code><a href=#domtokenlist-0>DOMTokenList</a></code> object is defined when the object is created. It might be a content @@ -5116,7 +5170,7 @@ </ol><p>Objects implementing the <code><a href=#domtokenlist-0>DOMTokenList</a></code> interface must <dfn id=dom-tokenlist-tostring title=dom-tokenlist-toString>stringify</dfn> to the object's - underlying string representation.<h4 id=safe-passing-of-structured-data><span class=secno>2.8.4 </span>Safe passing of structured data</h4><p>When a user agent is required to obtain a <dfn id=structured-clone>structured + underlying string representation.<h4 id=safe-passing-of-structured-data><span class=secno>2.9.4 </span>Safe passing of structured data</h4><p>When a user agent is required to obtain a <dfn id=structured-clone>structured clone</dfn> of an object, it must run the following algorithm, which either returns a separate object, or throws an exception.<ol><li><p>Let <var title="">input</var> be the object being cloned.</li> @@ -5193,7 +5247,7 @@ </ol></dd> - </dl><h4 id=domstringmap><span class=secno>2.8.5 </span>DOMStringMap</h4><p>The <code><a href=#domstringmap-0>DOMStringMap</a></code> interface represents a set of + </dl><h4 id=domstringmap><span class=secno>2.9.5 </span>DOMStringMap</h4><p>The <code><a href=#domstringmap-0>DOMStringMap</a></code> interface represents a set of name-value pairs. When a <code><a href=#domstringmap-0>DOMStringMap</a></code> object is instantiated, it is associated with three algorithms, one for getting getting the list of name-value pairs, one for setting names @@ -5215,7 +5269,7 @@ name.<p class=note>The <code><a href=#domstringmap-0>DOMStringMap</a></code> interface definition here is only intended for JavaScript environments. Other language bindings will need to define how <code><a href=#domstringmap-0>DOMStringMap</a></code> is to be - implemented for those languages.<h4 id=dom-feature-strings><span class=secno>2.8.6 </span>DOM feature strings</h4><p>DOM3 Core defines mechanisms for checking for interface support, + implemented for those languages.<h4 id=dom-feature-strings><span class=secno>2.9.6 </span>DOM feature strings</h4><p>DOM3 Core defines mechanisms for checking for interface support, and for obtaining implementations of interfaces, using <a href=http://www.w3.org/TR/DOM-Level-3-Core/core.html#DOMFeatures>feature strings</a>. <a href=#references>[DOM3CORE]</a><p>A DOM application can use the <dfn id=hasfeature title=hasFeature><code>hasFeature(<var title="">feature</var>, <var title="">version</var>)</code></dfn> method of the @@ -5235,7 +5289,7 @@ always supersets of the interfaces defined in DOM2 HTML; some features that were formerly deprecated, poorly supported, rarely used or considered unnecessary have been removed. Therefore it is - not guaranteed that an implementation that supports "<code title="">HTML</code>" "<code>5.0</code>" also supports "<code title="">HTML</code>" "<code>2.0</code>".<h4 id=exceptions><span class=secno>2.8.7 </span>Exceptions</h4><p>The following <code>DOMException</code> codes are defined in DOM + not guaranteed that an implementation that supports "<code title="">HTML</code>" "<code>5.0</code>" also supports "<code title="">HTML</code>" "<code>2.0</code>".<h4 id=exceptions><span class=secno>2.9.7 </span>Exceptions</h4><p>The following <code>DOMException</code> codes are defined in DOM Core. <a href=#references>[DOMCORE]</a></p><!-- XXX xref all these exceptions to DOM3CORE --><ol class=brief><li value=1><dfn id=index_size_err><code>INDEX_SIZE_ERR</code></dfn></li> <li value=2><dfn id=domstring_size_err><code>DOMSTRING_SIZE_ERR</code></dfn></li> <li value=3><dfn id=hierarchy_request_err><code>HIERARCHY_REQUEST_ERR</code></dfn></li> @@ -5261,7 +5315,7 @@ <li value=23><dfn id=unavailable_script_err><code>UNAVAILABLE_SCRIPT_ERR</code></dfn></li> <!-- actually defined right here for now --> <li value=81><dfn id=parse_err><code>PARSE_ERR</code></dfn></li> <!-- actually defined in dom3ls --> <li value=82><dfn id=serialise_err><code>SERIALISE_ERR</code></dfn></li> <!-- actually defined in dom3ls --> - </ol><h4 id=garbage-collection><span class=secno>2.8.8 </span>Garbage collection</h4><p>There is an <dfn id=implied-strong-reference>implied strong reference</dfn> from any DOM + </ol><h4 id=garbage-collection><span class=secno>2.9.8 </span>Garbage collection</h4><p>There is an <dfn id=implied-strong-reference>implied strong reference</dfn> from any DOM attribute that returns a pre-existing object to that object.<div class=example> <p>For example, the <code>document.location</code> attribute means @@ -39555,63 +39609,7 @@ </ol><p>The <a href=#document-s-character-encoding>document's character encoding</a> must immediately be set to the value returned from this algorithm, at the same time as the user agent uses the returned value to select the decoder to - use for the input stream.<h5 id=character-encoding-requirements><span class=secno>8.2.2.2 </span>Character encoding requirements</h5><p>User agents must at a minimum support the UTF-8 and Windows-1252 - encodings, but may support more.<p class=note>It is not unusual for Web browsers to support dozens - if not upwards of a hundred distinct character encodings.<p>User agents must support the preferred MIME name of every - character encoding they support that has a preferred MIME name, and - should support all the IANA-registered aliases. <a href=#references>[IANACHARSET]</a></p><!-- XXX should all this be abstracted out so it can be used for - <script charset=""> and <form accept-charset="">? Maybe move this - stuff and the 'character encodings' section of the terminology - section into its own infrastructure subsection? --><p>When comparing a string specifying a character encoding with the - name or alias of a character encoding to determine if they are - equal, user agents must use the Charset Alias Matching rules defined - in Unicode Technical Standard #22. <a href=#references>[UTS22]</a></p><!-- XXXrefs - http://unicode.org/reports/tr22/#Charset_Alias_Matching --><p class=example>For instance, "GB_2312-80" and "g.b.2312(80)" are - considered equivalent names.<p>When a user agent would otherwise use an encoding given in the - first column of the following table, it must instead use the - encoding given in the cell in the second column of the same row. Any - bytes that are treated differently due to this encoding aliasing - must be considered <a href=#parse-error title="parse error">parse - errors</a>.<table><caption>Character encoding overrides</caption> - <thead><tr><th> Input encoding <th> Replacement encoding <th> References - <tbody><!-- how about EUC-JP? --><tr><td> EUC-KR <td> Windows-949 <td> - <a href=#references>[EUCKR]</a> <!-- see reference for [EUC-KR] in RFC1557 --> - <a href=#references>[WIN949]</a><!-- http://www.microsoft.com/globaldev/reference/dbcs/949.mspx --> - <tr><td> GB2312 <td> GBK <td> - <a href=#references>[GB2312]</a><!-- XXX ? --> - <a href=#references>[GBK]</a><!-- http://www.iana.org/assignments/charset-reg/GBK --> - <tr><td> GB_2312-80 <td> GBK <td> - <a href=#references>[RFC1345]</a><!-- XXX consider more direct reference? --> - <a href=#references>[GBK]</a><!-- http://www.iana.org/assignments/charset-reg/GBK --> - <tr><td> ISO-8859-1 <td> Windows-1252 <td> - <a href=#references>[RFC1345]</a><!-- XXX consider more direct reference? --> - <a href=#references>[WIN1252]</a><!-- http://www.microsoft.com/globaldev/reference/sbcs/1252.htm --> - <tr><td> ISO-8859-9 <td> Windows-1254 <td> - <a href=#references>[RFC1345]</a><!-- XXX consider more direct reference? --> - <a href=#references>[WIN1254]</a><!-- http://www.microsoft.com/globaldev/reference/sbcs/1254.htm --> - <tr><td> ISO-8859-11 <td> Windows-874 <td> - <a href=#references>[ISO885911]</a><!-- get reference from http://www.iso.org/iso/iso_catalogue/catalogue_tc/catalogue_detail.htm?csnumber=28263 --> - <a href=#references>[WIN874]</a><!-- http://www.microsoft.com/globaldev/reference/sbcs/874.mspx --> - <tr><td> KS_C_5601-1987 <td> Windows-949 <td> - <a href=#references>[RFC1345]</a><!-- XXX consider more direct reference? --> - <a href=#references>[WIN949]</a><!-- http://www.microsoft.com/globaldev/reference/dbcs/949.mspx --> - <tr><td> TIS-620 <td> Windows-874 <td> - <a href=#references>[TIS620]</a> <!-- http://www.nectec.or.th/it-standards/std620/std620.htm --> - <a href=#references>[WIN874]</a><!-- http://www.microsoft.com/globaldev/reference/sbcs/874.mspx --> - <tr><td> US-ASCII <td> Windows-1252 <td> - <a href=#references>[RFC1345]</a><!-- XXX consider more direct reference? --> - <a href=#references>[WIN1252]</a><!-- http://www.microsoft.com/globaldev/reference/sbcs/1252.htm --> - <tr><td> x-x-big5 <td> Big5 <td> - <a href=#references>[BIG5]</a> <!-- XXX ? --> - </table><p class=note>The requirement to treat certain encodings as other - encodings according to the table above is a willful violation of the - W3C Character Model specification. <a href=#references>[CHARMOD]</a><p>User agents must not support the CESU-8, UTF-7, BOCU-1 and SCSU - encodings. <a href=#references>[CESU8]</a> <a href=#references>[UTF7]</a> <a href=#references>[BOCU1]</a> <a href=#references>[SCSU]</a><p>Support for encodings based on EBCDIC is not recommended. This - encoding is rarely used for publicly-facing Web content.<p>Support for UTF-32 is not recommended. This encoding is rarely - used, and frequently misimplemented.<p class=note>This specification does not make any attempt to - support EBCDIC-based encodings and UTF-32 in its algorithms; support - and use of these encodings can thus lead to unexpected behavior in - implementations of this specification.<h5 id=preprocessing-the-input-stream><span class=secno>8.2.2.3 </span>Preprocessing the input stream</h5><p>Given an encoding, the bytes in the input stream must be + use for the input stream.<h5 id=preprocessing-the-input-stream><span class=secno>8.2.2.2 </span>Preprocessing the input stream</h5><p>Given an encoding, the bytes in the input stream must be converted to Unicode characters for the tokeniser, as described by the rules for that encoding, except that the leading U+FEFF BYTE ORDER MARK character, if any, must not be stripped by the encoding @@ -39622,7 +39620,9 @@ U+FFFD REPLACEMENT CHARACTER code points.<p class=note>Bytes or sequences of bytes in the original byte stream that did not conform to the encoding specification (e.g. invalid UTF-8 byte sequences in a UTF-8 input stream) are - errors that conformance checkers are expected to report.<p>One leading U+FEFF BYTE ORDER MARK character must be ignored if + errors that conformance checkers are expected to report.<p>Any byte or sequences of bytes in the original byte stream that + is <a href=#misinterpreted-for-compatibility>misinterpreted for compatibility</a> is a <a href=#parse-error>parse + error</a>.<p>One leading U+FEFF BYTE ORDER MARK character must be ignored if any are present.<p>All U+0000 NULL characters in the input must be replaced by U+FFFD REPLACEMENT CHARACTERs. Any occurrences of such characters is a <a href=#parse-error>parse error</a>.<p>Any occurrences of any characters in the ranges U+0001 to U+0008, @@ -39659,7 +39659,7 @@ <a href=#the-input-stream>input stream</a> is reached when an <dfn id=explicit-eof-character>explicit "EOF" character</dfn> (inserted by the <code title=dom-document-close><a href=#dom-document-close>document.close()</a></code> method) is consumed. Otherwise, the "EOF" character is not a real character in - the stream, but rather the lack of any further characters.<h5 id=changing-the-encoding-while-parsing><span class=secno>8.2.2.4 </span>Changing the encoding while parsing</h5><p>When the parser requires the user agent to <dfn id=change-the-encoding>change the + the stream, but rather the lack of any further characters.<h5 id=changing-the-encoding-while-parsing><span class=secno>8.2.2.3 </span>Changing the encoding while parsing</h5><p>When the parser requires the user agent to <dfn id=change-the-encoding>change the encoding</dfn>, it must run the following steps. This might happen if the <a href=#encoding-sniffing-algorithm>encoding sniffing algorithm</a> described above failed to find an encoding, or if it found an encoding that was not
Received on Thursday, 19 February 2009 11:05:19 UTC