html5/spec Overview.html,1.4532,1.4533 from Ian Hickson via cvs-syncmail on 2010-11-02 (public-html-commits@w3.org from November 2010)

From: Ian Hickson via cvs-syncmail <cvsmail@w3.org>
Date: Tue, 02 Nov 2010 02:09:01 +0000
To: public-html-commits@w3.org
Message-Id: <E1PD6J4-0003yz-31@lionel-hutz.w3.org>
Update of /sources/public/html5/spec
In directory hutz:/tmp/cvs-serv15288

Modified Files:
	Overview.html 
Log Message:
Parser: don't convert 0000 to FFFD in the input stream processor, instead do it (mostly) in the tokenizer, so that we can instead swallow 0000s in body. (whatwg r5666)

Index: Overview.html
===================================================================
RCS file: /sources/public/html5/spec/Overview.html,v
retrieving revision 1.4532
retrieving revision 1.4533
diff -u -d -r1.4532 -r1.4533
--- Overview.html	2 Nov 2010 01:06:09 -0000	1.4532
+++ Overview.html	2 Nov 2010 02:08:58 -0000	1.4533
@@ -54457,12 +54457,12 @@
   motivated by a desire to increase the resilience of user agents in
   the face of na&iuml;ve transcoders.</p>
 
-  <p>All U+0000 NULL characters and code points in the range U+D800 to
-  U+DFFF<!-- surrogates not allowed e.g. in UTF-8, and we don't want
-  them to suddenly turn into codepoints when they go through a UTF-16
-  pipe --> in the input must be replaced by U+FFFD REPLACEMENT
-  CHARACTERs. Any occurrences of such characters and code points are
-  <a href="#parse-error" title="parse error">parse errors</a>.</p>
+  <p>Code points in the range U+D800 to U+DFFF<!-- surrogates are not
+  allowed e.g. in UTF-8, and we don't want them to suddenly turn into
+  codepoints when they go through a UTF-16 pipe --> in the input must
+  be replaced by U+FFFD REPLACEMENT CHARACTERs. Any occurrences of
+  such characters and code points are <a href="#parse-error" title="parse error">parse
+  errors</a>.</p>
 
   <p>Any occurrences of any characters in the ranges U+0001 to U+0008,
   <!-- HT, LF allowed --> <!-- U+000B is in the next list --> <!-- FF,
@@ -55095,6 +55095,10 @@
    <dt>U+003C LESS-THAN SIGN (&lt;)</dt>
    <dd>Switch to the <a href="#tag-open-state">tag open state</a>.</dd>
 
+   <dt>U+0000 NULL</dt>
+   <dd><a href="#parse-error">Parse error</a>. Emit the <a href="#current-input-character">current input
+   character</a> as a character token.</dd>
+
    <dt>EOF</dt>
    <dd>Emit an end-of-file token.</dd>
 
@@ -55126,6 +55130,10 @@
    <dt>U+003C LESS-THAN SIGN (&lt;)</dt>
    <dd>Switch to the <a href="#rcdata-less-than-sign-state">RCDATA less-than sign state</a>.</dd>
 
+   <dt>U+0000 NULL</dt>
+   <dd><a href="#parse-error">Parse error</a>. Emit a U+FFFD REPLACEMENT CHARACTER
+   character token.</dd>
+
    <dt>EOF</dt>
    <dd>Emit an end-of-file token.</dd>
 
@@ -55153,6 +55161,10 @@
   <dl class="switch"><dt>U+003C LESS-THAN SIGN (&lt;)</dt>
    <dd>Switch to the <a href="#rawtext-less-than-sign-state">RAWTEXT less-than sign state</a>.</dd>
 
+   <dt>U+0000 NULL</dt>
+   <dd><a href="#parse-error">Parse error</a>. Emit a U+FFFD REPLACEMENT CHARACTER
+   character token.</dd>
+
    <dt>EOF</dt>
    <dd>Emit an end-of-file token.</dd>
 
@@ -55167,6 +55179,10 @@
   <dl class="switch"><dt>U+003C LESS-THAN SIGN (&lt;)</dt>
    <dd>Switch to the <a href="#script-data-less-than-sign-state">script data less-than sign state</a>.</dd>
 
+   <dt>U+0000 NULL</dt>
+   <dd><a href="#parse-error">Parse error</a>. Emit a U+FFFD REPLACEMENT CHARACTER
+   character token.</dd>
+
    <dt>EOF</dt>
    <dd>Emit an end-of-file token.</dd>
 
@@ -55178,7 +55194,11 @@
 
   <p>Consume the <a href="#next-input-character">next input character</a>:</p>
 
-  <dl class="switch"><dt>EOF</dt>
+  <dl class="switch"><dt>U+0000 NULL</dt>
+   <dd><a href="#parse-error">Parse error</a>. Emit a U+FFFD REPLACEMENT CHARACTER
+   character token.</dd>
+
+   <dt>EOF</dt>
    <dd>Emit an end-of-file token.</dd>
 
    <dt>Anything else</dt>
@@ -55270,6 +55290,10 @@
    character</a> (add 0x0020 to the character's code point) to the
    current tag token's tag name.</dd>
 
+   <dt>U+0000 NULL</dt>
+   <dd><a href="#parse-error">Parse error</a>. Append a U+FFFD REPLACEMENT CHARACTER
+   character to the current tag token's tag name.</dd>
+
    <dt>EOF</dt>
    <dd><a href="#parse-error">Parse error</a>. Reconsume the EOF character in the
    <a href="#data-state">data state</a>.</dd>
@@ -55576,6 +55600,10 @@
    <dd><p>Switch to the <a href="#script-data-escaped-less-than-sign-state">script data escaped less-than sign
    state</a>.</dd>
 
+   <dt>U+0000 NULL</dt>
+   <dd><a href="#parse-error">Parse error</a>. Emit a U+FFFD REPLACEMENT CHARACTER
+   character token.</dd>
+
    <dt>EOF</dt>
    <dd><a href="#parse-error">Parse error</a>. Reconsume the EOF character in the
    <a href="#data-state">data state</a>.</dd>
@@ -55596,6 +55624,11 @@
    <dd><p>Switch to the <a href="#script-data-escaped-less-than-sign-state">script data escaped less-than sign
    state</a>.</dd>
 
+   <dt>U+0000 NULL</dt>
+   <dd><a href="#parse-error">Parse error</a>. Switch to the <a href="#script-data-escaped-state">script data
+   escaped state</a>. Emit a U+FFFD REPLACEMENT CHARACTER character
+   token.</dd>
+
    <dt>EOF</dt>
    <dd><a href="#parse-error">Parse error</a>. Reconsume the EOF character in the
    <a href="#data-state">data state</a>.</dd>
@@ -55619,6 +55652,11 @@
    <dd>Switch to the <a href="#script-data-state">script data state</a>. Emit a U+003E
    GREATER-THAN SIGN character token.</dd>
 
+   <dt>U+0000 NULL</dt>
+   <dd><a href="#parse-error">Parse error</a>. Switch to the <a href="#script-data-escaped-state">script data
+   escaped state</a>. Emit a U+FFFD REPLACEMENT CHARACTER character
+   token.</dd>
+
    <dt>EOF</dt>
    <dd><a href="#parse-error">Parse error</a>. Reconsume the EOF character in the
    <a href="#data-state">data state</a>.</dd>
@@ -55769,6 +55807,10 @@
    sign state</a>. Emit a U+003C LESS-THAN SIGN character
    token.</dd>
 
+   <dt>U+0000 NULL</dt>
+   <dd><a href="#parse-error">Parse error</a>. Emit a U+FFFD REPLACEMENT CHARACTER
+   character token.</dd>
+
    <dt>EOF</dt>
    <dd><a href="#parse-error">Parse error</a>. Reconsume the EOF character in the
    <a href="#data-state">data state</a>.</dd>
@@ -55790,6 +55832,11 @@
    sign state</a>. Emit a U+003C LESS-THAN SIGN character
    token.</dd>
 
+   <dt>U+0000 NULL</dt>
+   <dd><a href="#parse-error">Parse error</a>. Switch to the <a href="#script-data-double-escaped-state">script data
+   double escaped state</a>. Emit a U+FFFD REPLACEMENT CHARACTER
+   character token.</dd>
+
    <dt>EOF</dt>
    <dd><a href="#parse-error">Parse error</a>. Reconsume the EOF character in the
    <a href="#data-state">data state</a>.</dd>
@@ -55815,6 +55862,11 @@
    <dd>Switch to the <a href="#script-data-state">script data state</a>. Emit a U+003E
    GREATER-THAN SIGN character token.</dd>
 
+   <dt>U+0000 NULL</dt>
+   <dd><a href="#parse-error">Parse error</a>. Switch to the <a href="#script-data-double-escaped-state">script data
+   double escaped state</a>. Emit a U+FFFD REPLACEMENT CHARACTER
+   character token.</dd>
+
    <dt>EOF</dt>
    <dd><a href="#parse-error">Parse error</a>. Reconsume the EOF character in the
    <a href="#data-state">data state</a>.</dd>
@@ -55893,6 +55945,12 @@
    value to the empty string. Switch to the <a href="#attribute-name-state">attribute name
    state</a>.</dd>
 
+   <dt>U+0000 NULL</dt>
+   <dd><a href="#parse-error">Parse error</a>. Start a new attribute in the current
+   tag token. Set that attribute's name to a U+FFFD REPLACEMENT
+   CHARACTER character, and its value to the empty string. Switch to
+   the <a href="#attribute-name-state">attribute name state</a>.</dd>
+
    <dt>U+0022 QUOTATION MARK (")</dt>
    <dt>U+0027 APOSTROPHE (')</dt>
    <dt>U+003C LESS-THAN SIGN (&lt;)</dt>
@@ -55906,8 +55964,8 @@
 
    <dt>Anything else</dt>
    <dd>Start a new attribute in the current tag token. Set that
-   attribute's name to the <a href="#current-input-character">current input character</a>, and its value to
-   the empty string. Switch to the <a href="#attribute-name-state">attribute name
+   attribute's name to the <a href="#current-input-character">current input character</a>, and
+   its value to the empty string. Switch to the <a href="#attribute-name-state">attribute name
    state</a>.</dd>
 
   </dl><h5 id="attribute-name-state"><span class="secno">8.2.4.35 </span><dfn>Attribute name state</dfn></h5>
@@ -55936,6 +55994,10 @@
    character</a> (add 0x0020 to the character's code point) to the
    current attribute's name.</dd>
 
+   <dt>U+0000 NULL</dt>
+   <dd><a href="#parse-error">Parse error</a>. Append a U+FFFD REPLACEMENT CHARACTER
+   character to the current attribute's name.</dd>
+
    <dt>U+0022 QUOTATION MARK (")</dt>
    <dt>U+0027 APOSTROPHE (')</dt>
    <dt>U+003C LESS-THAN SIGN (&lt;)</dt>
@@ -55987,6 +56049,12 @@
    and its value to the empty string. Switch to the <a href="#attribute-name-state">attribute
    name state</a>.</dd>
 
+   <dt>U+0000 NULL</dt>
+   <dd><a href="#parse-error">Parse error</a>. Start a new attribute in the current
+   tag token. Set that attribute's name to a U+FFFD REPLACEMENT
+   CHARACTER character, and its value to the empty string. Switch to
+   the <a href="#attribute-name-state">attribute name state</a>.</dd>
+
    <dt>U+0022 QUOTATION MARK (")</dt>
    <dt>U+0027 APOSTROPHE (')</dt>
    <dt>U+003C LESS-THAN SIGN (&lt;)</dt>
@@ -56024,6 +56092,11 @@
    <dt>U+0027 APOSTROPHE (')</dt>
    <dd>Switch to the <a href="#attribute-value-single-quoted-state">attribute value (single-quoted) state</a>.</dd>
 
+   <dt>U+0000 NULL</dt>
+   <dd><a href="#parse-error">Parse error</a>. Append a U+FFFD REPLACEMENT CHARACTER
+   character to the current attribute's value. Switch to the
+   <a href="#attribute-value-unquoted-state">attribute value (unquoted) state</a>.</dd>
+
    <dt>U+003E GREATER-THAN SIGN (&gt;)</dt>
    <dd><a href="#parse-error">Parse error</a>. Switch to the <a href="#data-state">data
    state</a>. Emit the current tag token.</dd>
@@ -56056,6 +56129,10 @@
    state</a>, with the <a href="#additional-allowed-character">additional allowed character</a>
    being U+0022 QUOTATION MARK (").</dd>
 
+   <dt>U+0000 NULL</dt>
+   <dd><a href="#parse-error">Parse error</a>. Append a U+FFFD REPLACEMENT CHARACTER
+   character to the current attribute's value.</dd>
+
    <dt>EOF</dt>
    <dd><a href="#parse-error">Parse error</a>. Reconsume the EOF character in the
    <a href="#data-state">data state</a>.</dd>
@@ -56105,6 +56182,10 @@
    <dd>Switch to the <a href="#data-state">data state</a>. Emit the current tag
    token.</dd>
 
+   <dt>U+0000 NULL</dt>
+   <dd><a href="#parse-error">Parse error</a>. Append a U+FFFD REPLACEMENT CHARACTER
+   character to the current attribute's value.</dd>
+
    <dt>U+0022 QUOTATION MARK (")</dt>
    <dt>U+0027 APOSTROPHE (')</dt>
    <dt>U+003C LESS-THAN SIGN (&lt;)</dt>
@@ -56183,12 +56264,13 @@
   <p>Consume every character up to and including the first U+003E
   GREATER-THAN SIGN character (&gt;) or the end of the file (EOF),
   whichever comes first. Emit a comment token whose data is the
-  concatenation of all the characters starting from and including
-  the character that caused the state machine to switch into the
-  bogus comment state, up to and including the character immediately
-  before the last consumed character (i.e. up to the character just
-  before the U+003E or EOF character). (If the comment was started
-  by the end of the file (EOF), the token is empty.)</p>
+  concatenation of all the characters starting from and including the
+  character that caused the state machine to switch into the bogus
+  comment state, up to and including the character immediately before
+  the last consumed character (i.e. up to the character just before
+  the U+003E or EOF character), but with any U+0000 NULL characters
+  replaced by U+FFFD REPLACEMENT CHARACTER characters. (If the comment
+  was started by the end of the file (EOF), the token is empty.)</p>
 
   <p>Switch to the <a href="#data-state">data state</a>.</p>
 
@@ -56228,6 +56310,11 @@
   <dl class="switch"><dt>U+002D HYPHEN-MINUS (-)</dt>
    <dd>Switch to the <a href="#comment-start-dash-state">comment start dash state</a>.</dd>
 
+   <dt>U+0000 NULL</dt>
+   <dd><a href="#parse-error">Parse error</a>. Append a U+FFFD REPLACEMENT CHARACTER
+   character to the comment token's data. Switch to the <a href="#comment-state">comment
+   state</a>.</dd>
+
    <dt>U+003E GREATER-THAN SIGN (&gt;)</dt>
    <dd><a href="#parse-error">Parse error</a>. Switch to the <a href="#data-state">data
    state</a>. Emit the comment token.</dd> <!-- see comment in
@@ -56248,6 +56335,12 @@
   <dl class="switch"><dt>U+002D HYPHEN-MINUS (-)</dt>
    <dd>Switch to the <a href="#comment-end-state">comment end state</a></dd>
 
+   <dt>U+0000 NULL</dt>
+   <dd><a href="#parse-error">Parse error</a>. Append a U+002D HYPHEN-MINUS
+   character (-) and a U+FFFD REPLACEMENT CHARACTER character to the
+   comment token's data. Switch to the <a href="#comment-state">comment
+   state</a>.</dd>
+
    <dt>U+003E GREATER-THAN SIGN (&gt;)</dt>
    <dd><a href="#parse-error">Parse error</a>. Switch to the <a href="#data-state">data
    state</a>. Emit the comment token.</dd>
@@ -56269,6 +56362,10 @@
   <dl class="switch"><dt>U+002D HYPHEN-MINUS (-)</dt>
    <dd>Switch to the <a href="#comment-end-dash-state">comment end dash state</a></dd>
 
+   <dt>U+0000 NULL</dt>
+   <dd><a href="#parse-error">Parse error</a>. Append a U+FFFD REPLACEMENT CHARACTER
+   character to the comment token's data.</dd>
+
    <dt>EOF</dt>
    <dd><a href="#parse-error">Parse error</a>. Emit the comment token. Reconsume the
    EOF character in the <a href="#data-state">data state</a>.</dd> <!-- see comment
@@ -56285,6 +56382,12 @@
   <dl class="switch"><dt>U+002D HYPHEN-MINUS (-)</dt>
    <dd>Switch to the <a href="#comment-end-state">comment end state</a></dd>
 
+   <dt>U+0000 NULL</dt>
+   <dd><a href="#parse-error">Parse error</a>. Append a U+002D HYPHEN-MINUS
+   character (-) and a U+FFFD REPLACEMENT CHARACTER character to the
+   comment token's data. Switch to the <a href="#comment-state">comment
+   state</a>.</dd>
+
    <dt>EOF</dt>
    <dd><a href="#parse-error">Parse error</a>. Emit the comment token. Reconsume the
    EOF character in the <a href="#data-state">data state</a>.</dd> <!-- see comment
@@ -56303,6 +56406,12 @@
    <dd>Switch to the <a href="#data-state">data state</a>. Emit the comment
    token.</dd>
 
+   <dt>U+0000 NULL</dt>
+   <dd><a href="#parse-error">Parse error</a>. Append two U+002D HYPHEN-MINUS
+   characters (-) and a U+FFFD REPLACEMENT CHARACTER character to the
+   comment token's data. Switch to the <a href="#comment-state">comment
+   state</a>.</dd>
+
    <dt>U+0021 EXCLAMATION MARK (!)</dt>
    <dd><a href="#parse-error">Parse error</a>. Switch to the <a href="#comment-end-bang-state">comment end bang
    state</a>.</dd>
@@ -56338,6 +56447,12 @@
    <dd>Switch to the <a href="#data-state">data state</a>. Emit the comment
    token.</dd>
 
+   <dt>U+0000 NULL</dt>
+   <dd><a href="#parse-error">Parse error</a>. Append two U+002D HYPHEN-MINUS
+   characters (-), a U+0021 EXCLAMATION MARK character (!), and a
+   U+FFFD REPLACEMENT CHARACTER character to the comment token's data.
+   Switch to the <a href="#comment-state">comment state</a>.</dd>
+
    <dt>EOF</dt>
    <dd><a href="#parse-error">Parse error</a>. Emit the comment token. Reconsume
    the EOF character in the <a href="#data-state">data state</a>.</dd> <!-- see
@@ -56386,6 +56501,11 @@
    character's code point). Switch to the <a href="#doctype-name-state">DOCTYPE name
    state</a>.</dd>
 
+   <dt>U+0000 NULL</dt>
+   <dd><a href="#parse-error">Parse error</a>. Set the token's name to a U+FFFD
+   REPLACEMENT CHARACTER character. Switch to the <a href="#doctype-name-state">DOCTYPE name
+   state</a>.</dd>
+
    <dt>U+003E GREATER-THAN SIGN (&gt;)</dt>
    <dd><a href="#parse-error">Parse error</a>. Create a new DOCTYPE token. Set its
    <i>force-quirks flag</i> to <i>on</i>. Switch to the <a href="#data-state">data
@@ -56421,6 +56541,10 @@
    character</a> (add 0x0020 to the character's code point) to the
    current DOCTYPE token's name.</dd>
 
+   <dt>U+0000 NULL</dt>
+   <dd><a href="#parse-error">Parse error</a>. Append a U+FFFD REPLACEMENT CHARACTER
+   character to the current DOCTYPE token's name.</dd>
+
    <dt>EOF</dt>
    <dd><a href="#parse-error">Parse error</a>. Set the DOCTYPE token's
    <i>force-quirks flag</i> to <i>on</i>. Emit that DOCTYPE token.
@@ -56550,6 +56674,10 @@
   <dl class="switch"><dt>U+0022 QUOTATION MARK (")</dt>
    <dd>Switch to the <a href="#after-doctype-public-identifier-state">after DOCTYPE public identifier state</a>.</dd>
 
+   <dt>U+0000 NULL</dt>
+   <dd><a href="#parse-error">Parse error</a>. Append a U+FFFD REPLACEMENT CHARACTER
+   character to the current DOCTYPE token's public identifier.</dd>
+
    <dt>U+003E GREATER-THAN SIGN (&gt;)</dt>
    <dd><a href="#parse-error">Parse error</a>. Set the DOCTYPE token's
    <i>force-quirks flag</i> to <i>on</i>. Switch to the <a href="#data-state">data
@@ -56561,8 +56689,8 @@
    Reconsume the EOF character in the <a href="#data-state">data state</a>.</dd>
 
    <dt>Anything else</dt>
-   <dd>Append the <a href="#current-input-character">current input character</a> to the current DOCTYPE
-   token's public identifier.</dd>
+   <dd>Append the <a href="#current-input-character">current input character</a> to the current
+   DOCTYPE token's public identifier.</dd>
 
   </dl><h5 id="doctype-public-identifier-single-quoted-state"><span class="secno">8.2.4.59 </span><dfn>DOCTYPE public identifier (single-quoted) state</dfn></h5>
 
@@ -56571,6 +56699,10 @@
   <dl class="switch"><dt>U+0027 APOSTROPHE (')</dt>
    <dd>Switch to the <a href="#after-doctype-public-identifier-state">after DOCTYPE public identifier state</a>.</dd>
 
+   <dt>U+0000 NULL</dt>
+   <dd><a href="#parse-error">Parse error</a>. Append a U+FFFD REPLACEMENT CHARACTER
+   character to the current DOCTYPE token's public identifier.</dd>
+
    <dt>U+003E GREATER-THAN SIGN (&gt;)</dt>
    <dd><a href="#parse-error">Parse error</a>. Set the DOCTYPE token's
    <i>force-quirks flag</i> to <i>on</i>. Switch to the <a href="#data-state">data
@@ -56582,8 +56714,8 @@
    Reconsume the EOF character in the <a href="#data-state">data state</a>.</dd>
 
    <dt>Anything else</dt>
-   <dd>Append the <a href="#current-input-character">current input character</a> to the current DOCTYPE
-   token's public identifier.</dd>
+   <dd>Append the <a href="#current-input-character">current input character</a> to the current
+   DOCTYPE token's public identifier.</dd>
 
   </dl><h5 id="after-doctype-public-identifier-state"><span class="secno">8.2.4.60 </span><dfn>After DOCTYPE public identifier state</dfn></h5>
 
@@ -56737,6 +56869,10 @@
    <dd>Switch to the <a href="#after-doctype-system-identifier-state">after DOCTYPE system identifier
    state</a>.</dd>
 
+   <dt>U+0000 NULL</dt>
+   <dd><a href="#parse-error">Parse error</a>. Append a U+FFFD REPLACEMENT CHARACTER
+   character to the current DOCTYPE token's system identifier.</dd>
+
    <dt>U+003E GREATER-THAN SIGN (&gt;)</dt>
    <dd><a href="#parse-error">Parse error</a>. Set the DOCTYPE token's
    <i>force-quirks flag</i> to <i>on</i>. Switch to the <a href="#data-state">data
@@ -56759,6 +56895,10 @@
    <dd>Switch to the <a href="#after-doctype-system-identifier-state">after DOCTYPE system identifier
    state</a>.</dd>
 
+   <dt>U+0000 NULL</dt>
+   <dd><a href="#parse-error">Parse error</a>. Append a U+FFFD REPLACEMENT CHARACTER
+   character to the current DOCTYPE token's system identifier.</dd>
+
    <dt>U+003E GREATER-THAN SIGN (&gt;)</dt>
    <dd><a href="#parse-error">Parse error</a>. Set the DOCTYPE token's
    <i>force-quirks flag</i> to <i>on</i>. Switch to the <a href="#data-state">data
@@ -56821,7 +56961,9 @@
   end of the file (EOF), whichever comes first. Emit a series of
   character tokens consisting of all the characters consumed except
   the matching three character sequence at the end (if one was found
-  before the end of the file).</p>
+  before the end of the file)<!--(not needed; taken care of by the
+  tree constructor), but with any U+0000 NULL characters replaced by
+  U+FFFD REPLACEMENT CHARACTER characters-->.</p>
 
   <p>Switch to the <a href="#data-state">data state</a>.</p>
 
@@ -58013,7 +58155,23 @@
   <p>When the <a href="#insertion-mode">insertion mode</a> is "<a href="#parsing-main-inbody" title="insertion
   mode: in body">in body</a>", tokens must be handled as follows:</p>
 
-  <dl class="switch"><dt>A character token</dt>
+  <dl class="switch"><dt>A character token that is U+0000 NULL</dt>
+   <dd>
+
+    <p><a href="#parse-error">Parse error</a>. Ignore the token.</p>
+
+    <!-- The D-Link DSL-G604T ADSL router has a zero byte in its
+         configuration UI before a <frameset>, which is why U+0000 is
+         special-cased here.
+         refs: https://bugzilla.mozilla.org/show_bug.cgi?id=563526
+               http://www.w3.org/Bugs/Public/show_bug.cgi?id=9659
+    -->
+
+   </dd>
+
+   <dt>A character token that is one of U+0009 CHARACTER TABULATION,
+   U+000A LINE FEED (LF), U+000C FORM FEED (FF), U+000D CARRIAGE
+   RETURN (CR), or U+0020 SPACE</dt>
    <dd>
 
     <p><a href="#reconstruct-the-active-formatting-elements">Reconstruct the active formatting elements</a>, if
@@ -58022,19 +58180,18 @@
     <p><a href="#insert-a-character" title="insert a character">Insert the token's
     character</a> into the <a href="#current-node">current node</a>.</p>
 
-    <p>If the token is not one of U+0009 CHARACTER TABULATION, U+000A
-    LINE FEED (LF), U+000C FORM FEED (FF), U+000D CARRIAGE RETURN
-    (CR), U+0020 SPACE, or U+FFFD REPLACEMENT CHARACTER, then set the
-    <a href="#frameset-ok-flag">frameset-ok flag</a> to "not ok".</p>
+   </dd>
 
-    <!-- U+FFFD REPLACEMENT CHARACTER is in this list because the
-         D-Link DSL-G604T ADSL router has a zero byte in its
-         configuration UI before a <frameset>. Zero bytes get
-         converted to U+FFFD, which (without that character in this
-         list) would mean the <frameset> would be ignored.
-         refs: https://bugzilla.mozilla.org/show_bug.cgi?id=563526
-               http://www.w3.org/Bugs/Public/show_bug.cgi?id=9659
-    -->
+   <dt>Any other character token</dt>
+   <dd>
+
+    <p><a href="#reconstruct-the-active-formatting-elements">Reconstruct the active formatting elements</a>, if
+    any.</p>
+
+    <p><a href="#insert-a-character" title="insert a character">Insert the token's
+    character</a> into the <a href="#current-node">current node</a>.</p>
+
+    <p>Set the <a href="#frameset-ok-flag">frameset-ok flag</a> to "not ok".</p>
 
    </dd>
 
@@ -59257,6 +59414,10 @@
     <p><a href="#insert-a-character" title="insert a character">Insert the token's
     character</a> into the <a href="#current-node">current node</a>.</p>
 
+    <p class="note">This can never be a U+0000 NULL character; the
+    tokenizer converts those to U+FFFD REPLACEMENT CHARACTER
+    characters.</p>
+
    </dd>
 
    <dt>An end-of-file token</dt>
@@ -60053,7 +60214,12 @@
   <p>When the <a href="#insertion-mode">insertion mode</a> is "<a href="#parsing-main-inselect" title="insertion
   mode: in select">in select</a>", tokens must be handled as follows:</p>
 
-  <dl class="switch"><dt>A character token</dt>
+  <dl class="switch"><dt>A character token that is U+0000 NULL</dt>
+   <dd>
+    <p><a href="#parse-error">Parse error</a>. Ignore the token.</p>
+   </dd>
+
+   <dt>Any other character token</dt>
    <dd>
     <p><a href="#insert-a-character" title="insert a character">Insert the token's
     character</a> into the <a href="#current-node">current node</a>.</p>
@@ -60254,16 +60420,32 @@
 
     </ol></dd>
 
-   <dt>A character token</dt>
+   <dt>A character token that is U+0000 NULL</dt>
+   <dd>
+
+    <p><a href="#parse-error">Parse error</a>. <a href="#insert-a-character" title="insert a
+    character">Insert a U+FFFD REPLACEMENT CHARACTER character</a>
+    into the <a href="#current-node">current node</a>.</p>
+
+   </dd>
+
+   <dt>A character token that is one of U+0009 CHARACTER TABULATION,
+   U+000A LINE FEED (LF), U+000C FORM FEED (FF), U+000D CARRIAGE
+   RETURN (CR), or U+0020 SPACE</dt>
    <dd>
 
     <p><a href="#insert-a-character" title="insert a character">Insert the token's
     character</a> into the <a href="#current-node">current node</a>.</p>
 
-    <p>If the token is not one of U+0009 CHARACTER TABULATION, U+000A
-    LINE FEED (LF), U+000C FORM FEED (FF), U+000D CARRIAGE RETURN
-    (CR), or U+0020 SPACE, then set the <a href="#frameset-ok-flag">frameset-ok
-    flag</a> to "not ok".</p>
+   </dd>
+
+   <dt>Any other character token</dt>
+   <dd>
+
+    <p><a href="#insert-a-character" title="insert a character">Insert the token's
+    character</a> into the <a href="#current-node">current node</a>.</p>
+
+    <p>Set the <a href="#frameset-ok-flag">frameset-ok flag</a> to "not ok".</p>
 
    </dd>
Received on Tuesday, 2 November 2010 02:09:05 UTC