[whatwg] Parsing entities from Ian Hickson on 2006-08-14 (public-whatwg-archive@w3.org from August 2006)

From: Ian Hickson <ian@hixie.ch>
Date: Mon, 14 Aug 2006 20:24:54 +0000 (UTC)
Message-ID: <Pine.LNX.4.62.0608142023580.5340@dhalsim.dreamhost.com>

On Mon, 14 Aug 2006, Simon Pieters wrote:
> 
> I guess that for compat with IE and the Web[1] we have to treat 
> "R&eacutesum&eacute" as if it were "R&eacute;sum&eacute;". So how do we 
> handle "&noti;"? When the parser has come as far as "&not" it can't 
> return U+00AC yet because it could well be "&notin;". But when it has 
> reached "&noti;" then it can't be "&notin;", thus it returns U+00AC, but 
> then you also have to reparse the "i;", right? Unless I'm mistaken the 
> spec doesn't say anything about that.

Section 8.2.3.1 "Tokenising entities", under "Anything else", covers this: 
"Consume the maximum number of characters possible, with the consumed 
characters case-sensitively matching one of the identifiers in the first 
column of the entities table".

HTH,
-- 
Ian Hickson               U+1047E                )\._.,--....,'``.    fL
http://ln.hixie.ch/       U+263A                /,   _.. \   _\  ;`._ ,.
Things that are impossible just take longer.   `._.-(,_..'--(,_..'`-.;.'

Received on Monday, 14 August 2006 13:24:54 UTC