- From: Ian Hickson <ian@hixie.ch>
- Date: Fri, 13 Sep 2013 21:18:55 +0000 (UTC)
- To: Geoffrey Sneddon <foolistbar@googlemail.com>
- Cc: WHAT Working Group <whatwg@whatwg.org>
- Message-ID: <alpine.DEB.2.00.1309131921540.12199@ps20323.dreamhostps.com>
On Thu, 5 Sep 2013, Geoffrey Sneddon wrote: > > The phrasing content section states: > > > Text nodes and attribute values must consist of Unicode characters, > > must not contain U+0000 characters, must not contain permanently > > undefined Unicode characters (noncharacters), and must not contain > > control characters other than space characters. > > And the pre-processing the input-stream section states: > > > Any occurrences of any characters in the ranges U+0001 to U+0008, > > U+000E to U+001F, U+007F to U+009F, U+FDD0 to U+FDEF, and characters > > U+000B, U+FFFE, U+FFFF, U+1FFFE, U+1FFFF, U+2FFFE, U+2FFFF, U+3FFFE, > > U+3FFFF, U+4FFFE, U+4FFFF, U+5FFFE, U+5FFFF, U+6FFFE, U+6FFFF, > > U+7FFFE, U+7FFFF, U+8FFFE, U+8FFFF, U+9FFFE, U+9FFFF, U+AFFFE, > > U+AFFFF, U+BFFFE, U+BFFFF, U+CFFFE, U+CFFFF, U+DFFFE, U+DFFFF, > > U+EFFFE, U+EFFFF, U+FFFFE, U+FFFFF, U+10FFFE, and U+10FFFF are parse > > errors. These are all control characters or permanently undefined > > Unicode characters (noncharacters). > > Note the first uses "Unicode characters", the second "characters" — the > former excludes surrogates as a conformance requirement. > > Note that every disallowed non-surrogate character is a parse error. > > Therefore, it would make sense to make surrogates parse errors. Done. > It should be noted that they can only occur in the input stream if they > come from script (as they cannot be decoded from the input byte stream > as the decoders will never emit a surrogate). Done. -- Ian Hickson U+1047E )\._.,--....,'``. fL http://ln.hixie.ch/ U+263A /, _.. \ _\ ;`._ ,. Things that are impossible just take longer. `._.-(,_..'--(,_..'`-.;.'
Received on Friday, 13 September 2013 21:19:20 UTC