W3C home > Mailing lists > Public > public-html-diffs@w3.org > April 2010

hixie: Make 
 map to U+000D and not U+000A. This has ramifications throughout the parser. (whatwg r4933)

From: poot <cvsmail@w3.org>
Date: Thu, 1 Apr 2010 14:34:34 +0900 (JST)
To: public-html-diffs@w3.org
Message-Id: <20100401053434.B67E12BC3D@toro.w3.mag.keio.ac.jp>
hixie: Make &#13; map to U+000D and not U+000A. This has ramifications
throughout the parser. (whatwg r4933)

http://dev.w3.org/cvsweb/html5/spec/Overview.html?r1=1.3953&r2=1.3954&f=h
http://html5.org/tools/web-apps-tracker?from=4932&to=4933

===================================================================
RCS file: /sources/public/html5/spec/Overview.html,v
retrieving revision 1.3953
retrieving revision 1.3954
diff -u -d -r1.3953 -r1.3954
--- Overview.html	1 Apr 2010 01:00:43 -0000	1.3953
+++ Overview.html	1 Apr 2010 01:21:40 -0000	1.3954
@@ -51637,7 +51637,10 @@
   to be put, as described in the other sections.<h5 id="newlines"><span class="secno">8.1.3.1 </span>Newlines</h5><p class="XXX annotation"><b>Status: </b><i>Last call for comments</i><p><dfn id="syntax-newlines" title="syntax-newlines">Newlines</dfn> in HTML may be
   represented either as U+000D CARRIAGE RETURN (CR) characters, U+000A
   LINE FEED (LF) characters, or pairs of U+000D CARRIAGE RETURN (CR),
-  U+000A LINE FEED (LF) characters in that order.<h4 id="character-references"><span class="secno">8.1.4 </span>Character references</h4><p class="XXX annotation"><b>Status: </b><i>Last call for comments</i><p>In certain cases described in other sections, <a href="#syntax-text" title="syntax-text">text</a> may be mixed with <dfn id="syntax-charref" title="syntax-charref">character references</dfn>. These can be used
+  U+000A LINE FEED (LF) characters in that order.<p>Where <a href="#syntax-charref" title="syntax-charref">character references</a>
+  are allowed, a character reference of a U+000A LINE FEED (LF)
+  character (but not a U+000D CARRIAGE RETURN (CR) character) also
+  represents a <a href="#syntax-newlines" title="syntax-newlines">newline</a>.<h4 id="character-references"><span class="secno">8.1.4 </span>Character references</h4><p class="XXX annotation"><b>Status: </b><i>Last call for comments</i><p>In certain cases described in other sections, <a href="#syntax-text" title="syntax-text">text</a> may be mixed with <dfn id="syntax-charref" title="syntax-charref">character references</dfn>. These can be used
   to escape characters that couldn't otherwise legally be included in
   <a href="#syntax-text" title="syntax-text">text</a>.<p>Character references must start with a U+0026 AMPERSAND character
   (&amp;). Following this, there are three possible kinds of character
@@ -51674,9 +51677,9 @@
    (;).</dd>
 
   </dl><p>The numeric character reference forms described above are allowed
-  to reference any Unicode code point other than U+0000, permanently
-  undefined Unicode characters (noncharacters), and control characters
-  other than <a href="#space-character" title="space character">space
+  to reference any Unicode code point other than U+0000, U+000D,
+  permanently undefined Unicode characters (noncharacters), and
+  control characters other than <a href="#space-character" title="space character">space
   characters</a>.<p>An <dfn id="syntax-ambiguous-ampersand" title="syntax-ambiguous-ampersand">ambiguous
   ampersand</dfn> is a U+0026 AMPERSAND character (&amp;) that is
   followed by some <a href="#syntax-text" title="syntax-text">text</a> other than a
@@ -54978,7 +54981,7 @@
 
     <table><thead><tr><th>Number <th colspan="2">Unicode character
      <tbody><tr><td>0x00 <td>U+FFFD <td>REPLACEMENT CHARACTER
-      <tr><td>0x0D <td>U+000A <td>LINE FEED (LF)
+      <tr><td>0x0D <td>U+000D <td>CARRIAGE RETURN (CR)
       <tr><td>0x80 <td>U+20AC <td>EURO SIGN (&euro;)
       <tr><td>0x81 <td>U+0081 <td>&lt;control&gt;
       <tr><td>0x82 <td>U+201A <td>SINGLE LOW-9 QUOTATION MARK (&sbquo;)
@@ -55400,7 +55403,7 @@
 
   <dl class="switch"><dt>A character token that is one of U+0009 CHARACTER
    TABULATION, U+000A LINE FEED (LF), U+000C FORM FEED (FF),
-   <!--U+000D CARRIAGE RETURN (CR),--> or U+0020 SPACE</dt>
+   U+000D CARRIAGE RETURN (CR), or U+0020 SPACE</dt>
    <dd>
     <p>Ignore the token.</p>
    </dd>
@@ -55606,7 +55609,7 @@
 
    <dt>A character token that is one of U+0009 CHARACTER
    TABULATION, U+000A LINE FEED (LF), U+000C FORM FEED (FF),
-   <!--U+000D CARRIAGE RETURN (CR),--> or U+0020 SPACE</dt>
+   U+000D CARRIAGE RETURN (CR), or U+0020 SPACE</dt>
    <dd>
     <p>Ignore the token.</p>
    </dd>
@@ -55678,7 +55681,7 @@
 
   <dl class="switch"><dt>A character token that is one of U+0009 CHARACTER
    TABULATION, U+000A LINE FEED (LF), U+000C FORM FEED (FF),
-   <!--U+000D CARRIAGE RETURN (CR),--> or U+0020 SPACE</dt>
+   U+000D CARRIAGE RETURN (CR), or U+0020 SPACE</dt>
    <dd>
     <p>Ignore the token.</p> <!-- :-( -->
    </dd>
@@ -55744,7 +55747,7 @@
 
   <dl class="switch"><dt>A character token that is one of U+0009 CHARACTER
    TABULATION, U+000A LINE FEED (LF), U+000C FORM FEED (FF),
-   <!--U+000D CARRIAGE RETURN (CR),--> or U+0020 SPACE</dt>
+   U+000D CARRIAGE RETURN (CR), or U+0020 SPACE</dt>
    <dd>
     <p><a href="#insert-a-character" title="insert a character">Insert the character</a> into
     the <a href="#current-node">current node</a>.</p>
@@ -55929,7 +55932,7 @@
 
    <dt>A character token that is one of U+0009 CHARACTER
    TABULATION, U+000A LINE FEED (LF), U+000C FORM FEED (FF),
-   <!--U+000D CARRIAGE RETURN (CR),--> or U+0020 SPACE</dt>
+   U+000D CARRIAGE RETURN (CR), or U+0020 SPACE</dt>
    <dt>A comment token</dt>
    <dt>A start tag whose tag name is one of: "link", "meta", "noframes", "style"</dt>
    <dd>
@@ -55966,7 +55969,7 @@
 
   <dl class="switch"><dt>A character token that is one of U+0009 CHARACTER
    TABULATION, U+000A LINE FEED (LF), U+000C FORM FEED (FF),
-   <!--U+000D CARRIAGE RETURN (CR),--> or U+0020 SPACE</dt>
+   U+000D CARRIAGE RETURN (CR), or U+0020 SPACE</dt>
    <dd>
     <p><a href="#insert-a-character" title="insert a character">Insert the character</a> into
     the <a href="#current-node">current node</a>.</p>
@@ -56064,8 +56067,8 @@
     character</a> into the <a href="#current-node">current node</a>.</p>
 
     <p>If the token is not one of U+0009 CHARACTER TABULATION, U+000A
-    LINE FEED (LF), U+000C FORM FEED (FF), <!--U+000D CARRIAGE RETURN
-    (CR),--> or U+0020 SPACE, then set the <a href="#frameset-ok-flag">frameset-ok
+    LINE FEED (LF), U+000C FORM FEED (FF), U+000D CARRIAGE RETURN
+    (CR), or U+0020 SPACE, then set the <a href="#frameset-ok-flag">frameset-ok
     flag</a> to "not ok".</p>
 
    </dd>
@@ -56261,6 +56264,9 @@
     one. (Newlines at the start of <code><a href="#the-pre-element">pre</a></code> blocks are
     ignored as an authoring convenience.)</p>
 
+    <!-- <pre>[CR]X will eat the [CR], <pre>&#x10;X will eat the
+    &#x10;, but <pre>&#x13;X will not eat the &#x13;. -->
+
     <p>Set the <a href="#frameset-ok-flag">frameset-ok flag</a> to "not ok".</p>
 
    </dd>
@@ -56997,6 +57003,8 @@
      token, then ignore that token and move on to the next
      one. (Newlines at the start of <code><a href="#the-textarea-element">textarea</a></code> elements are
      ignored as an authoring convenience.)</li>
+     
+     <!-- see comment in <pre> start tag bit -->
 
      <li><p>Switch the tokenizer to the <a href="#rcdata-state">RCDATA
      state</a>.</li>
@@ -57624,7 +57632,7 @@
     <p>If any of the tokens in the <var><a href="#pending-table-character-tokens">pending table character
     tokens</a></var> list are character tokens that are not one of U+0009
     CHARACTER TABULATION, U+000A LINE FEED (LF), U+000C FORM FEED
-    (FF), <!--U+000D CARRIAGE RETURN (CR),--> or U+0020 SPACE, then
+    (FF), U+000D CARRIAGE RETURN (CR), or U+0020 SPACE, then
     reprocess those character tokens using the rules given in the
     "anything else" entry in the <a href="#parsing-main-intable" title="insertion mode: in
     table">in table</a>" insertion mode.</p>
@@ -57703,7 +57711,7 @@
 
   <dl class="switch"><dt>A character token that is one of U+0009 CHARACTER
    TABULATION, U+000A LINE FEED (LF), U+000C FORM FEED (FF),
-   <!--U+000D CARRIAGE RETURN (CR),--> or U+0020 SPACE</dt>
+   U+000D CARRIAGE RETURN (CR), or U+0020 SPACE</dt>
    <dd>
     <p><a href="#insert-a-character" title="insert a character">Insert the character</a> into
     the <a href="#current-node">current node</a>.</p>
@@ -58249,8 +58257,8 @@
     character</a> into the <a href="#current-node">current node</a>.</p>
 
     <p>If the token is not one of U+0009 CHARACTER TABULATION, U+000A
-    LINE FEED (LF), U+000C FORM FEED (FF), <!--U+000D CARRIAGE RETURN
-    (CR),--> or U+0020 SPACE, then set the <a href="#frameset-ok-flag">frameset-ok
+    LINE FEED (LF), U+000C FORM FEED (FF), U+000D CARRIAGE RETURN
+    (CR), or U+0020 SPACE, then set the <a href="#frameset-ok-flag">frameset-ok
     flag</a> to "not ok".</p>
 
    </dd>
@@ -58470,7 +58478,7 @@
 
   <dl class="switch"><dt>A character token that is one of U+0009 CHARACTER
    TABULATION, U+000A LINE FEED (LF), U+000C FORM FEED (FF),
-   <!--U+000D CARRIAGE RETURN (CR),--> or U+0020 SPACE</dt>
+   U+000D CARRIAGE RETURN (CR), or U+0020 SPACE</dt>
    <dd>
     <p>Process the token <a href="#using-the-rules-for">using the rules for</a> the "<a href="#parsing-main-inbody" title="insertion mode: in body">in body</a>" <a href="#insertion-mode">insertion
     mode</a>.</p>
@@ -58528,7 +58536,7 @@
 
   <dl class="switch"><dt>A character token that is one of U+0009 CHARACTER
    TABULATION, U+000A LINE FEED (LF), U+000C FORM FEED (FF),
-   <!--U+000D CARRIAGE RETURN (CR),--> or U+0020 SPACE</dt>
+   U+000D CARRIAGE RETURN (CR), or U+0020 SPACE</dt>
    <dd>
     <p><a href="#insert-a-character" title="insert a character">Insert the character</a> into
     the <a href="#current-node">current node</a>.</p>
@@ -58622,7 +58630,7 @@
   <!-- due to rules in the "in frameset" mode, this can't be entered in the fragment case -->
   <dl class="switch"><dt>A character token that is one of U+0009 CHARACTER
    TABULATION, U+000A LINE FEED (LF), U+000C FORM FEED (FF),
-   <!--U+000D CARRIAGE RETURN (CR),--> or U+0020 SPACE</dt>
+   U+000D CARRIAGE RETURN (CR), or U+0020 SPACE</dt>
    <dd>
     <p><a href="#insert-a-character" title="insert a character">Insert the character</a> into
     the <a href="#current-node">current node</a>.</p>
@@ -58683,7 +58691,7 @@
    <dt>A DOCTYPE token</dt>
    <dt>A character token that is one of U+0009 CHARACTER
    TABULATION, U+000A LINE FEED (LF), U+000C FORM FEED (FF),
-   <!--U+000D CARRIAGE RETURN (CR),--> or U+0020 SPACE</dt>
+   U+000D CARRIAGE RETURN (CR), or U+0020 SPACE</dt>
    <dt>A start tag whose tag name is "html"</dt>
    <dd>
     <p>Process the token <a href="#using-the-rules-for">using the rules for</a> the "<a href="#parsing-main-inbody" title="insertion mode: in body">in body</a>" <a href="#insertion-mode">insertion
@@ -58717,7 +58725,7 @@
    <dt>A DOCTYPE token</dt>
    <dt>A character token that is one of U+0009 CHARACTER
    TABULATION, U+000A LINE FEED (LF), U+000C FORM FEED (FF),
-   <!--U+000D CARRIAGE RETURN (CR),--> or U+0020 SPACE</dt>
+   U+000D CARRIAGE RETURN (CR), or U+0020 SPACE</dt>
    <dt>A start tag whose tag name is "html"</dt>
    <dd>
     <p>Process the token <a href="#using-the-rules-for">using the rules for</a> the "<a href="#parsing-main-inbody" title="insertion mode: in body">in body</a>" <a href="#insertion-mode">insertion
Received on Thursday, 1 April 2010 05:35:04 GMT

This archive was generated by hypermail 2.2.0+W3C-0.50 : Saturday, 18 December 2010 06:14:18 GMT