W3C home > Mailing lists > Public > public-html-diffs@w3.org > February 2011

hixie: Remove the requirement that the parser deal with raw surrogates, since they can't make it this far. (whatwg r5862)

From: poot <cvsmail@w3.org>
Date: Tue, 08 Feb 2011 19:30:30 -0500
To: public-html-diffs@w3.org
Message-Id: <E1Pmxx0-0004Mi-HH@jay.w3.org>
hixie: Remove the requirement that the parser deal with raw surrogates,
since they can't make it this far. (whatwg r5862)

http://dev.w3.org/cvsweb/html5/spec/Overview.html?r1=1.4704&r2=1.4705&f=h
http://html5.org/tools/web-apps-tracker?from=5861&to=5862

===================================================================
RCS file: /sources/public/html5/spec/Overview.html,v
retrieving revision 1.4704
retrieving revision 1.4705
diff -u -d -r1.4704 -r1.4705
--- Overview.html	9 Feb 2011 00:06:20 -0000	1.4704
+++ Overview.html	9 Feb 2011 00:29:17 -0000	1.4705
@@ -55384,13 +55384,6 @@
   motivated by a desire to increase the resilience of user agents in
   the face of na&iuml;ve transcoders.</p>
 
-  <p>Code points in the range U+D800 to U+DFFF<!-- surrogates are not
-  allowed e.g. in UTF-8, and we don't want them to suddenly turn into
-  code points when they go through a UTF-16 pipe --> in the input must
-  be replaced by U+FFFD REPLACEMENT CHARACTERs. Any occurrences of
-  such characters and code points are <a href="#parse-error" title="parse error">parse
-  errors</a>.</p>
-
   <p>Any occurrences of any characters in the ranges U+0001 to U+0008,
   <!-- HT, LF allowed --> <!-- U+000B is in the next list --> <!-- FF,
   CR allowed --> U+000E to U+001F, <!-- ASCII allowed --> U+007F
@@ -58026,10 +58019,9 @@
       <tr><td>0x9E <td>U+017E <td>LATIN SMALL LETTER Z WITH CARON (&#382;)
       <tr><td>0x9F <td>U+0178 <td>LATIN CAPITAL LETTER Y WITH DIAERESIS (&Yuml;)
     </table><p>Otherwise, if the number is in the range 0xD800 to 0xDFFF<!--
-    surrogates not allowed; see the comment in the "preprocessing the
-    input stream" section for details --> or is greater than 0x10FFFF,
-    then this is a <a href="#parse-error">parse error</a>. Return a U+FFFD
-    REPLACEMENT CHARACTER.</p>
+    surrogates --> or is greater than 0x10FFFF, then this is a
+    <a href="#parse-error">parse error</a>. Return a U+FFFD REPLACEMENT
+    CHARACTER.</p>
 
     <p>Otherwise, return a character token for the Unicode character
     whose code point is that number.
Received on Wednesday, 9 February 2011 00:30:32 GMT

This archive was generated by hypermail 2.2.0+W3C-0.50 : Wednesday, 9 February 2011 00:30:40 GMT