- From: Ian Hickson via cvs-syncmail <cvsmail@w3.org>
- Date: Wed, 09 Feb 2011 00:29:22 +0000
- To: public-html-commits@w3.org
Update of /sources/public/html5/spec In directory hutz:/tmp/cvs-serv24583 Modified Files: Overview.html Log Message: Remove the requirement that the parser deal with raw surrogates, since they can't make it this far. (whatwg r5862) Index: Overview.html =================================================================== RCS file: /sources/public/html5/spec/Overview.html,v retrieving revision 1.4704 retrieving revision 1.4705 diff -u -d -r1.4704 -r1.4705 --- Overview.html 9 Feb 2011 00:06:20 -0000 1.4704 +++ Overview.html 9 Feb 2011 00:29:17 -0000 1.4705 @@ -55384,13 +55384,6 @@ motivated by a desire to increase the resilience of user agents in the face of naïve transcoders.</p> - <p>Code points in the range U+D800 to U+DFFF<!-- surrogates are not - allowed e.g. in UTF-8, and we don't want them to suddenly turn into - code points when they go through a UTF-16 pipe --> in the input must - be replaced by U+FFFD REPLACEMENT CHARACTERs. Any occurrences of - such characters and code points are <a href="#parse-error" title="parse error">parse - errors</a>.</p> - <p>Any occurrences of any characters in the ranges U+0001 to U+0008, <!-- HT, LF allowed --> <!-- U+000B is in the next list --> <!-- FF, CR allowed --> U+000E to U+001F, <!-- ASCII allowed --> U+007F @@ -58026,10 +58019,9 @@ <tr><td>0x9E <td>U+017E <td>LATIN SMALL LETTER Z WITH CARON (ž) <tr><td>0x9F <td>U+0178 <td>LATIN CAPITAL LETTER Y WITH DIAERESIS (Ÿ) </table><p>Otherwise, if the number is in the range 0xD800 to 0xDFFF<!-- - surrogates not allowed; see the comment in the "preprocessing the - input stream" section for details --> or is greater than 0x10FFFF, - then this is a <a href="#parse-error">parse error</a>. Return a U+FFFD - REPLACEMENT CHARACTER.</p> + surrogates --> or is greater than 0x10FFFF, then this is a + <a href="#parse-error">parse error</a>. Return a U+FFFD REPLACEMENT + CHARACTER.</p> <p>Otherwise, return a character token for the Unicode character whose code point is that number.
Received on Wednesday, 9 February 2011 00:29:24 UTC