- From: Ian Hickson via cvs-syncmail <cvsmail@w3.org>
- Date: Wed, 16 Sep 2009 09:16:23 +0000
- To: public-html-commits@w3.org
Update of /sources/public/html5/spec In directory hutz:/tmp/cvs-serv30723 Modified Files: Overview.html Log Message: Make surrogates in UTF-8 and character references turn into U+FFFD to prevent UTF-16 environments having hard-to-handle bugs. (whatwg r3871) Index: Overview.html =================================================================== RCS file: /sources/public/html5/spec/Overview.html,v retrieving revision 1.3035 retrieving revision 1.3036 diff -u -d -r1.3035 -r1.3036 --- Overview.html 16 Sep 2009 08:07:24 -0000 1.3035 +++ Overview.html 16 Sep 2009 09:16:20 -0000 1.3036 @@ -55883,23 +55883,25 @@ motivated by a desire to increase the resilience of user agents in the face of naïve transcoders.</p> - <p>All U+0000 NULL characters in the input must be replaced by - U+FFFD REPLACEMENT CHARACTERs. Any occurrences of such characters is - a <a href="#parse-error">parse error</a>.</p> + <p>All U+0000 NULL characters and characters in the range U+D800 to + U+DFFF<!-- surrogates not allowed e.g. in UTF-8, and we don't want + them to suddenly turn into codepoints when they go through a UTF-16 + pipe --> in the input must be replaced by U+FFFD REPLACEMENT + CHARACTERs. Any occurrences of such characters is a <a href="#parse-error">parse + error</a>.</p> <p>Any occurrences of any characters in the ranges U+0001 to U+0008, <!-- HT, LF allowed --> <!-- U+000B is in the next list --> <!-- FF, CR allowed --> U+000E to U+001F, <!-- ASCII allowed --> U+007F - <!--to U+0084, (U+0085 NEL not allowed), U+0086--> to U+009F, U+D800 - to U+DFFF<!-- surrogates not allowed -->, U+FDD0 to U+FDEF, and - characters U+000B, U+FFFE, U+FFFF, U+1FFFE, U+1FFFF, U+2FFFE, - U+2FFFF, U+3FFFE, U+3FFFF, U+4FFFE, U+4FFFF, U+5FFFE, U+5FFFF, - U+6FFFE, U+6FFFF, U+7FFFE, U+7FFFF, U+8FFFE, U+8FFFF, U+9FFFE, - U+9FFFF, U+AFFFE, U+AFFFF, U+BFFFE, U+BFFFF, U+CFFFE, U+CFFFF, - U+DFFFE, U+DFFFF, U+EFFFE, U+EFFFF, U+FFFFE, U+FFFFF, U+10FFFE, and - U+10FFFF are <a href="#parse-error" title="parse error">parse errors</a>. (These - are all control characters or permanently undefined Unicode - characters.)</p> + <!--to U+0084, (U+0085 NEL not allowed), U+0086--> to U+009F, U+FDD0 + to U+FDEF, and characters U+000B, U+FFFE, U+FFFF, U+1FFFE, U+1FFFF, + U+2FFFE, U+2FFFF, U+3FFFE, U+3FFFF, U+4FFFE, U+4FFFF, U+5FFFE, + U+5FFFF, U+6FFFE, U+6FFFF, U+7FFFE, U+7FFFF, U+8FFFE, U+8FFFF, + U+9FFFE, U+9FFFF, U+AFFFE, U+AFFFF, U+BFFFE, U+BFFFF, U+CFFFE, + U+CFFFF, U+DFFFE, U+DFFFF, U+EFFFE, U+EFFFF, U+FFFFE, U+FFFFF, + U+10FFFE, and U+10FFFF are <a href="#parse-error" title="parse error">parse + errors</a>. (These are all control characters or permanently + undefined Unicode characters.)</p> <p>U+000D CARRIAGE RETURN (CR) characters and U+000A LINE FEED (LF) characters are treated specially. Any CR characters that are @@ -57734,9 +57736,11 @@ <tr><td>0x9D <td>U+009D <td><control> <tr><td>0x9E <td>U+017E <td>LATIN SMALL LETTER Z WITH CARON ('ž') <tr><td>0x9F <td>U+0178 <td>LATIN CAPITAL LETTER Y WITH DIAERESIS ('Ÿ') - </table><p>Otherwise, if the number is greater than 0x10FFFF, then this is - a <a href="#parse-error">parse error</a>. Return a U+FFFD REPLACEMENT - CHARACTER.</p> + </table><p>Otherwise, if the number is in the range 0xD800 to 0xDFFF<!-- + surrogates not allowed; see the comment in the "preprocessing the + input stream" section for details --> or is greater than 0x10FFFF, + then this is a <a href="#parse-error">parse error</a>. Return a U+FFFD + REPLACEMENT CHARACTER.</p> <p>Otherwise, return a character token for the Unicode character whose code point is that number. @@ -57746,14 +57750,14 @@ If the number is in the range 0x0001 to 0x0008, <!-- HT, LF allowed --> <!-- U+000B is in the next list --> <!-- FF, CR allowed --> 0x000E to 0x001F, <!-- ASCII allowed --> 0x007F <!--to - 0x0084, (0x0085 NEL not allowed), 0x0086--> to 0x009F, 0xD800 to - 0xDFFF<!-- surrogates not allowed -->, 0xFDD0 to 0xFDEF, or is one - of 0x000B, 0xFFFE, 0xFFFF, 0x1FFFE, 0x1FFFF, 0x2FFFE, 0x2FFFF, - 0x3FFFE, 0x3FFFF, 0x4FFFE, 0x4FFFF, 0x5FFFE, 0x5FFFF, 0x6FFFE, - 0x6FFFF, 0x7FFFE, 0x7FFFF, 0x8FFFE, 0x8FFFF, 0x9FFFE, 0x9FFFF, - 0xAFFFE, 0xAFFFF, 0xBFFFE, 0xBFFFF, 0xCFFFE, 0xCFFFF, 0xDFFFE, - 0xDFFFF, 0xEFFFE, 0xEFFFF, 0xFFFFE, 0xFFFFF, 0x10FFFE, or - 0x10FFFF, then this is a <a href="#parse-error">parse error</a>.</p> + 0x0084, (0x0085 NEL not allowed), 0x0086--> to 0x009F, 0xFDD0 to + 0xFDEF, or is one of 0x000B, 0xFFFE, 0xFFFF, 0x1FFFE, 0x1FFFF, + 0x2FFFE, 0x2FFFF, 0x3FFFE, 0x3FFFF, 0x4FFFE, 0x4FFFF, 0x5FFFE, + 0x5FFFF, 0x6FFFE, 0x6FFFF, 0x7FFFE, 0x7FFFF, 0x8FFFE, 0x8FFFF, + 0x9FFFE, 0x9FFFF, 0xAFFFE, 0xAFFFF, 0xBFFFE, 0xBFFFF, 0xCFFFE, + 0xCFFFF, 0xDFFFE, 0xDFFFF, 0xEFFFE, 0xEFFFF, 0xFFFFE, 0xFFFFF, + 0x10FFFE, or 0x10FFFF, then this is a <a href="#parse-error">parse + error</a>.</p> </dd>
Received on Wednesday, 16 September 2009 09:16:33 UTC