hixie: Allow a few more unescaped &s. (whatwg r4960) from poot on 2010-04-02 (public-html-diffs@w3.org from April 2010)

From: poot <cvsmail@w3.org>
Date: Sat, 3 Apr 2010 08:18:26 +0900 (JST)
To: public-html-diffs@w3.org
Message-Id: <20100402231826.8830B2BC5D@toro.w3.mag.keio.ac.jp>
hixie: Allow a few more unescaped &s. (whatwg r4960)

http://dev.w3.org/cvsweb/html5/spec/Overview.html?r1=1.3982&r2=1.3983&f=h
http://html5.org/tools/web-apps-tracker?from=4959&to=4960

===================================================================
RCS file: /sources/public/html5/spec/Overview.html,v
retrieving revision 1.3982
retrieving revision 1.3983
diff -u -d -r1.3982 -r1.3983
--- Overview.html 2 Apr 2010 22:39:37 -0000 1.3982
+++ Overview.html 2 Apr 2010 23:18:10 -0000 1.3983
@@ -1752,14 +1752,14 @@
      <pre class="bad">&lt;a href="?original=1&amp;copy=2"&gt;Compare&lt;/a&gt;</pre>
 
      <p>To avoid this problem, all named character references are
-     required to end with a semicolon, and any ampersands followed by
-     letters are required to be escaped.</p>
+     required to end with a semicolon, and uses of named character
+     references without a semicolon are flagged as errors.</p>
 
      <p>Thus, the correct way to express the above cases is as
      follows:</p>
 
-     <pre>&lt;a href="?hello=1&amp;amp;world=2"&gt;Demo&lt;/a&gt;</pre>
-     <pre>&lt;a href="?original=1&amp;amp;copy=2"&gt;Compare&lt;/a&gt;</pre>
+     <pre>&lt;a href="?hello=1&amp;world=2"&gt;Demo&lt;/a&gt; &lt;!-- &amp;world is ok, since it's not a named character reference --&gt;</pre>
+     <pre>&lt;a href="?original=1&amp;amp;copy=2"&gt;Compare&lt;/a&gt; &lt;!-- the &amp; has to be escaped, since &amp;copy <em>is</em> a named character reference --&gt;</pre>
 
     </div>
 
@@ -51805,9 +51805,12 @@
   control characters other than <a href="#space-character" title="space character">space
   characters</a>.<p>An <dfn id="syntax-ambiguous-ampersand" title="syntax-ambiguous-ampersand">ambiguous
   ampersand</dfn> is a U+0026 AMPERSAND character (&amp;) that is
-  followed by some <a href="#syntax-text" title="syntax-text">text</a> other than a
-  <a href="#space-character">space character</a>, a U+003C LESS-THAN SIGN character
-  (&lt;), or another U+0026 AMPERSAND character (&amp;).<h4 id="cdata-sections"><span class="secno">8.1.5 </span>CDATA sections</h4><p class="XXX annotation"><b>Status: </b><i>Last call for comments</i><p><dfn id="syntax-cdata" title="syntax-cdata">CDATA sections</dfn> must start with
+  followed by one or more characters in the range U+0030 DIGIT ZERO
+  (0) to U+0039 DIGIT NINE (9), U+0061 LATIN SMALL LETTER A to U+007A
+  LATIN SMALL LETTER Z, and U+0041 LATIN CAPITAL LETTER A to U+005A
+  LATIN CAPITAL LETTER Z, followed by a U+003B SEMICOLON character
+  (;), where these characters do not match any of the names given in
+  the <a href="#named-character-references">named character references</a> section.<h4 id="cdata-sections"><span class="secno">8.1.5 </span>CDATA sections</h4><p class="XXX annotation"><b>Status: </b><i>Last call for comments</i><p><dfn id="syntax-cdata" title="syntax-cdata">CDATA sections</dfn> must start with
   the character sequence U+003C LESS-THAN SIGN, U+0021 EXCLAMATION
   MARK, U+005B LEFT SQUARE BRACKET, U+0043 LATIN CAPITAL LETTER C,
   U+0044 LATIN CAPITAL LETTER D, U+0041 LATIN CAPITAL LETTER A, U+0054
@@ -55172,12 +55175,14 @@
     column of the <a href="#named-character-references">named character references</a> table (in a
     <a href="#case-sensitive">case-sensitive</a> manner).</p>
 
-    <p>If no match can be made, then this is a <a href="#parse-error">parse
-    error</a>. No characters are consumed, and nothing is
-    returned.</p>
-
-    <p>If the last character matched is not a U+003B SEMICOLON
-    character (;), there is a <a href="#parse-error">parse error</a>.</p>
+    <p>If no match can be made, then no characters are consumed, and
+    nothing is returned. In this case, if the characters after the
+    U+0026 AMPERSAND character (&amp;) consist of a sequence of one or
+    more characters in the range U+0030 DIGIT ZERO (0) to U+0039 DIGIT
+    NINE (9), U+0061 LATIN SMALL LETTER A to U+007A LATIN SMALL LETTER
+    Z, and U+0041 LATIN CAPITAL LETTER A to U+005A LATIN CAPITAL
+    LETTER Z, followed by a U+003B SEMICOLON character (;), then this
+    is a <a href="#parse-error">parse error</a>.</p>
 
     <p>If the character reference is being consumed <a href="#character-reference-in-attribute-value-state" title="character reference in attribute value state">as part of an
     attribute</a>, and the last character matched is not a U+003B
@@ -55190,19 +55195,23 @@
     (&amp;) must be unconsumed, and nothing is returned.</p>
     <!-- "=" added because of http://www.w3.org/Bugs/Public/show_bug.cgi?id=9207#c5 -->
 
-    <p>Otherwise, return a character token for the character
-    corresponding to the character reference name (as given by the
-    second column of the <a href="#named-character-references">named character references</a>
-    table).</p>
+    <p>Otherwise, a character reference is parsed. If the last
+    character matched is not a U+003B SEMICOLON character (;), there
+    is a <a href="#parse-error">parse error</a>.</p>
+
+    <p>Return a character token for the character corresponding to the
+    character reference name (as given by the second column of the
+    <a href="#named-character-references">named character references</a> table).</p>
 
     <div class="example">
 
-     <p>If the markup contains <code title="">I'm &amp;notit; I tell
-     you</code>, the character reference is parsed as "not", as in,
-     <code title="">I'm &not;it; I tell you</code>. But if the markup
+     <p>If the markup contains (not in an attribute) the string <code title="">I'm &amp;notit; I tell you</code>, the character
+     reference is parsed as "not", as in, <code title="">I'm &not;it;
+     I tell you</code> (and this is a parse error). But if the markup
      was <code title="">I'm &amp;notin; I tell you</code>, the
      character reference would be parsed as "notin;", resulting in
-     <code title="">I'm &notin; I tell you</code>.</p>
+     <code title="">I'm &notin; I tell you</code> (and no parse
+     error).</p>
 
     </div>
Received on Friday, 2 April 2010 23:18:55 UTC