Re: Parse error characters from Ian Hickson on 2008-05-22 (public-html@w3.org from May 2008)

From: Ian Hickson <ian@hixie.ch>
Date: Thu, 22 May 2008 11:48:39 +0000 (UTC)
To: Henri Sivonen <hsivonen@iki.fi>
Cc: HTMLWG Tracking WG <public-html@w3.org>
Message-ID: <Pine.LNX.4.62.0805221141580.12911@hixie.dreamhostps.com>

On Fri, 14 Mar 2008, Henri Sivonen wrote:
> 
> Consuming an entity says:
> > Otherwise, if the number is zero, if the number is higher than 0x10FFFF, or
> > if it's one of the surrogate characters (characters in the range 0xD800 to
> > 0xDFFF), then this is a parse error; return a character token for the U+FFFD
> > REPLACEMENT CHARACTER character instead.
> 
> Preprocessing the input stream says:
> > Any occurrences of any characters in the ranges U+0001 to U+0008, U+000E to
> > U+001F, U+007F to U+009F, U+D800 to U+DFFF , U+FDD0 to U+FDDF, and
> > characters U+FFFE, U+FFFF, U+1FFFE, U+1FFFF, U+2FFFE, U+2FFFF, U+3FFFE,
> > U+3FFFF, U+4FFFE, U+4FFFF, U+5FFFE, U+5FFFF, U+6FFFE, U+6FFFF, U+7FFFE,
> > U+7FFFF, U+8FFFE, U+8FFFF, U+9FFFE, U+9FFFF, U+AFFFE, U+AFFFF, U+BFFFE,
> > U+BFFFF, U+CFFFE, U+CFFFF, U+DFFFE, U+DFFFF, U+EFFFE, U+EFFFF, U+FFFFE,
> > U+FFFFF, U+10FFFE, and U+10FFFF are parse errors. (These are all control
> > characters or permanently undefined Unicode characters.)
> 
> I suggest making characters that are parse errors in the input stream 
> parse errors also when expanded from an NCR.

Done.

-- 
Ian Hickson               U+1047E                )\._.,--....,'``.    fL
http://ln.hixie.ch/       U+263A                /,   _.. \   _\  ;`._ ,.
Things that are impossible just take longer.   `._.-(,_..'--(,_..'`-.;.'

Received on Thursday, 22 May 2008 11:49:27 UTC