- From: Ian Hickson via cvs-syncmail <cvsmail@w3.org>
- Date: Mon, 13 Feb 2012 22:50:21 +0000
- To: public-html-commits@w3.org
Update of /sources/public/html5/spec In directory hutz:/tmp/cvs-serv16754 Modified Files: Overview.html Log Message: Move a section so that the character encoding requirements are closer together. (whatwg r6992) Index: Overview.html =================================================================== RCS file: /sources/public/html5/spec/Overview.html,v retrieving revision 1.5584 retrieving revision 1.5585 diff -u -d -r1.5584 -r1.5585 --- Overview.html 13 Feb 2012 22:48:18 -0000 1.5584 +++ Overview.html 13 Feb 2012 22:50:16 -0000 1.5585 @@ -1157,8 +1157,8 @@ <ol> <li><a href="#determining-the-character-encoding"><span class="secno">8.2.2.1 </span>Determining the character encoding</a></li> <li><a href="#character-encodings-0"><span class="secno">8.2.2.2 </span>Character encodings</a></li> - <li><a href="#preprocessing-the-input-stream"><span class="secno">8.2.2.3 </span>Preprocessing the input stream</a></li> - <li><a href="#changing-the-encoding-while-parsing"><span class="secno">8.2.2.4 </span>Changing the encoding while parsing</a></ol></li> + <li><a href="#changing-the-encoding-while-parsing"><span class="secno">8.2.2.3 </span>Changing the encoding while parsing</a></li> + <li><a href="#preprocessing-the-input-stream"><span class="secno">8.2.2.4 </span>Preprocessing the input stream</a></ol></li> <li><a href="#parse-state"><span class="secno">8.2.3 </span>Parse state</a> <ol> <li><a href="#the-insertion-mode"><span class="secno">8.2.3.1 </span>The insertion mode</a></li> @@ -58895,7 +58895,59 @@ - <h5 id="preprocessing-the-input-stream"><span class="secno">8.2.2.3 </span>Preprocessing the input stream</h5> + <h5 id="changing-the-encoding-while-parsing"><span class="secno">8.2.2.3 </span>Changing the encoding while parsing</h5> + + <p>When the parser requires the user agent to <dfn id="change-the-encoding">change the + encoding</dfn>, it must run the following steps. This might happen + if the <a href="#encoding-sniffing-algorithm">encoding sniffing algorithm</a> described above + failed to find an encoding, or if it found an encoding that was not + the actual encoding of the file.</p> + + <ol><li>If the encoding that is already being used to interpret the + input stream is <a href="#a-utf-16-encoding">a UTF-16 encoding</a>, then set the <a href="#concept-encoding-confidence" title="concept-encoding-confidence">confidence</a> to + <i>certain</i> and abort these steps. The new encoding is ignored; + if it was anything but the same encoding, then it would be clearly + incorrect.</li> + + <li>If the new encoding is <a href="#a-utf-16-encoding">a UTF-16 encoding</a>, change + it to UTF-8.</li> + + <li>If the new encoding is identical or equivalent to the encoding + that is already being used to interpret the input stream, then set + the <a href="#concept-encoding-confidence" title="concept-encoding-confidence">confidence</a> to + <i>certain</i> and abort these steps. This happens when the + encoding information found in the file matches what the + <a href="#encoding-sniffing-algorithm">encoding sniffing algorithm</a> determined to be the + encoding, and in the second pass through the parser if the first + pass found that the encoding sniffing algorithm described in the + earlier section failed to find the right encoding.</li> + + <li>If all the bytes up to the last byte converted by the current + decoder have the same Unicode interpretations in both the current + encoding and the new encoding, and if the user agent supports + changing the converter on the fly, then the user agent may change + to the new converter for the encoding on the fly. Set the + <a href="#document-s-character-encoding">document's character encoding</a> and the encoding used to + convert the input stream to the new encoding, set the <a href="#concept-encoding-confidence" title="concept-encoding-confidence">confidence</a> to + <i>certain</i>, and abort these steps.</li> + + <li>Otherwise, <a href="#navigate">navigate</a> to the + document again, with <a href="#replacement-enabled">replacement enabled</a>, and using + the same <a href="#source-browsing-context">source browsing context</a>, but this time skip + the <a href="#encoding-sniffing-algorithm">encoding sniffing algorithm</a> and instead just set + the encoding to the new encoding and the <a href="#concept-encoding-confidence" title="concept-encoding-confidence">confidence</a> to + <i>certain</i>. Whenever possible, this should be done without + actually contacting the network layer (the bytes should be + re-parsed from memory), even if, e.g., the document is marked as + not being cacheable. If this is not possible and contacting the + network layer would involve repeating a request that uses a method + other than HTTP GET (<a href="#concept-http-equivalent-get" title="concept-http-equivalent-get">or + equivalent</a> for non-HTTP URLs), then instead set the <a href="#concept-encoding-confidence" title="concept-encoding-confidence">confidence</a> to + <i>certain</i> and ignore the new encoding. The resource will be + misinterpreted. User agents may notify the user of the situation, + to aid in application development.</li> + + </ol><h5 id="preprocessing-the-input-stream"><span class="secno">8.2.2.4 </span>Preprocessing the input stream</h5> <p>The <dfn id="input-stream">input stream</dfn> consists of the characters pushed into it as the <a href="#the-input-byte-stream">input byte stream</a> is decoded or from the @@ -58952,60 +59004,7 @@ consumed. Otherwise, the "EOF" character is not a real character in the stream, but rather the lack of any further characters.</p> - - <h5 id="changing-the-encoding-while-parsing"><span class="secno">8.2.2.4 </span>Changing the encoding while parsing</h5> - - <p>When the parser requires the user agent to <dfn id="change-the-encoding">change the - encoding</dfn>, it must run the following steps. This might happen - if the <a href="#encoding-sniffing-algorithm">encoding sniffing algorithm</a> described above - failed to find an encoding, or if it found an encoding that was not - the actual encoding of the file.</p> - - <ol><li>If the encoding that is already being used to interpret the - input stream is <a href="#a-utf-16-encoding">a UTF-16 encoding</a>, then set the <a href="#concept-encoding-confidence" title="concept-encoding-confidence">confidence</a> to - <i>certain</i> and abort these steps. The new encoding is ignored; - if it was anything but the same encoding, then it would be clearly - incorrect.</li> - - <li>If the new encoding is <a href="#a-utf-16-encoding">a UTF-16 encoding</a>, change - it to UTF-8.</li> - - <li>If the new encoding is identical or equivalent to the encoding - that is already being used to interpret the input stream, then set - the <a href="#concept-encoding-confidence" title="concept-encoding-confidence">confidence</a> to - <i>certain</i> and abort these steps. This happens when the - encoding information found in the file matches what the - <a href="#encoding-sniffing-algorithm">encoding sniffing algorithm</a> determined to be the - encoding, and in the second pass through the parser if the first - pass found that the encoding sniffing algorithm described in the - earlier section failed to find the right encoding.</li> - - <li>If all the bytes up to the last byte converted by the current - decoder have the same Unicode interpretations in both the current - encoding and the new encoding, and if the user agent supports - changing the converter on the fly, then the user agent may change - to the new converter for the encoding on the fly. Set the - <a href="#document-s-character-encoding">document's character encoding</a> and the encoding used to - convert the input stream to the new encoding, set the <a href="#concept-encoding-confidence" title="concept-encoding-confidence">confidence</a> to - <i>certain</i>, and abort these steps.</li> - - <li>Otherwise, <a href="#navigate">navigate</a> to the - document again, with <a href="#replacement-enabled">replacement enabled</a>, and using - the same <a href="#source-browsing-context">source browsing context</a>, but this time skip - the <a href="#encoding-sniffing-algorithm">encoding sniffing algorithm</a> and instead just set - the encoding to the new encoding and the <a href="#concept-encoding-confidence" title="concept-encoding-confidence">confidence</a> to - <i>certain</i>. Whenever possible, this should be done without - actually contacting the network layer (the bytes should be - re-parsed from memory), even if, e.g., the document is marked as - not being cacheable. If this is not possible and contacting the - network layer would involve repeating a request that uses a method - other than HTTP GET (<a href="#concept-http-equivalent-get" title="concept-http-equivalent-get">or - equivalent</a> for non-HTTP URLs), then instead set the <a href="#concept-encoding-confidence" title="concept-encoding-confidence">confidence</a> to - <i>certain</i> and ignore the new encoding. The resource will be - misinterpreted. User agents may notify the user of the situation, - to aid in application development.</li> - - </ol></div><div class="impl"> + </div><div class="impl"> <h4 id="parse-state"><span class="secno">8.2.3 </span>Parse state</h4>
Received on Monday, 13 February 2012 22:50:23 UTC