hixie: Be more compatible with what browsers do with multibyte characters in submissions. (whatwg r4970)

hixie: Be more compatible with what browsers do with multibyte
characters in submissions. (whatwg r4970)

http://dev.w3.org/cvsweb/html5/spec/Overview.html?r1=1.3992&r2=1.3993&f=h
http://html5.org/tools/web-apps-tracker?from=4969&to=4970

===================================================================
RCS file: /sources/public/html5/spec/Overview.html,v
retrieving revision 1.3992
retrieving revision 1.3993
diff -u -d -r1.3992 -r1.3993
--- Overview.html 4 Apr 2010 22:43:21 -0000 1.3992
+++ Overview.html 5 Apr 2010 04:36:56 -0000 1.3993
@@ -285,7 +285,7 @@
    <h1>HTML5</h1>
    <h2 class="no-num no-toc" id="a-vocabulary-and-associated-apis-for-html-and-xhtml">A vocabulary and associated APIs for HTML and XHTML</h2>
 
-   <h2 class="no-num no-toc" id="editor-s-draft-4-april-2010">Editor's Draft 4 April 2010</h2>
+   <h2 class="no-num no-toc" id="editor-s-draft-5-april-2010">Editor's Draft 5 April 2010</h2>
    <dl><dt>Latest Published Version:</dt>
     <dd><a href="http://www.w3.org/TR/html5/">http://www.w3.org/TR/html5/</a></dd>
     <dt>Latest Editor's Draft:</dt>
@@ -392,7 +392,7 @@
   specification's progress along the W3C Recommendation
   track.
 
-  This specification is the 4 April 2010 Editor's Draft.
+  This specification is the 5 April 2010 Editor's Draft.
   </p><!-- UNDER NO CIRCUMSTANCES IS THE PRECEDING PARAGRAPH TO BE REMOVED OR EDITED WITHOUT TALKING TO IAN FIRST --><!-- relationship to other work (required) --><p>The contents of this specification are also part of <a href="http://www.whatwg.org/specs/web-apps/current-work/multipage/">a
   specification</a> published by the <a href="http://www.whatwg.org/">WHATWG</a>, which is available under a
   license that permits reuse of the specification text.</p><!-- UNDER NO CIRCUMSTANCES IS THE FOLLOWING PARAGRAPH TO BE REMOVED OR EDITED WITHOUT TALKING TO IAN FIRST --><!-- required patent boilerplate --><p>This document was produced by a group operating under the <a href="http://www.w3.org/Consortium/Patent-Policy-20040205/">5
@@ -34362,24 +34362,56 @@
      <li>
 
       <p>For each character in the entry's name and value, apply the
-      following subsubsteps:</p>
+      appropriate subsubsteps from the following list:</p>
 
-      <ol><!-- * - . _ 0-9 a-z A-Z --><li><p>If the character isn't in the range U+0020, U+002A,
+      <dl class="switch"><dt>The character is a U+0020 SPACE character</dt>
+
+       <dd>Replace the character with a single U+002B PLUS SIGN
+       character (+).</dd>
+
+
+       <!-- * - . _ 0-9 a-z A-Z -->
+
+       <dt>If the character isn't in the range U+0020, U+002A,
        U+002D, U+002E, U+0030 to U+0039, U+0041 to U+005A, U+005F,
-       U+0061 to U+007A then replace the character with a string
-       formed as follows: Start with the empty string, and then,
-       taking each byte of the character when expressed in the
-       selected character encoding in turn, append to the string a
-       U+0025 PERCENT SIGN character (%) followed by two characters in
-       the ranges U+0030 DIGIT ZERO (0) to U+0039 DIGIT NINE (9) and
-       U+0041 LATIN CAPITAL LETTER A to U+0046 LATIN CAPITAL LETTER F
-       representing the hexadecimal value of the byte (zero-padded if
-       necessary).</li>
+       U+0061 to U+007A</dt>
 
-       <li><p>If the character is a U+0020 SPACE character, replace it
-       with a single U+002B PLUS SIGN character (+).</li>
+       <dd>
 
-      </ol></li>
+        <p>Replace the character with a string formed as follows:</p>
+
+        <ol><li><p>Let <var title="">s</var> be an empty string.</li>
+
+         <li>
+
+          <p>For each byte <var title="">b</var> of the character when
+          expressed in the selected character encoding in turn, run
+          the appropriate subsubsubstep from the list below:</p>
+
+          <dl class="switch"><dt>If the byte is in the range 0x20, 0x2A, 0x2D, 0x2E,
+           0x30 to 0x39, 0x41 to 0x5A, 0x5F, 0x61 to 0x7A</dt>
+
+           <dd><p>Append to <var title="">s</var> the Unicode
+           character with the codepoint equal to the byte.</dd>
+
+           <dt>Otherwise</dt>
+
+           <dd><p>Append to the string a U+0025 PERCENT SIGN character
+           (%) followed by two characters in the ranges U+0030 DIGIT
+           ZERO (0) to U+0039 DIGIT NINE (9) and U+0041 LATIN CAPITAL
+           LETTER A to U+0046 LATIN CAPITAL LETTER F representing the
+           hexadecimal value of the byte (zero-padded if
+           necessary).</dd>
+
+          </dl></li>
+
+        </ol></dd>
+
+       <dt>Otherwise</dt>
+
+       <dd><p>Leave the character as is.</dd>
+
+      </dl></li>
 
      <li><p>If the entry's name is "<code title="">isindex</code>",
      its type is "<code title="">text</code>", and this is the first

Received on Monday, 5 April 2010 04:37:40 UTC