[whatwg] Possible bug in the character encoding detection algorithm from Ian Hickson on 2007-03-03 (public-whatwg-archive@w3.org from March 2007)

From: Ian Hickson <ian@hixie.ch>
Date: Sat, 3 Mar 2007 01:28:33 +0000 (UTC)
Message-ID: <Pine.LNX.4.62.0703030020320.20826@dhalsim.dreamhost.com>

On Fri, 2 Mar 2007, James Graham wrote:
>
> Given the following line of input:
> <a b='c'>
> 012345678  - byte numbers for reference
> 
> Jump to step labeled "value"
> (Presumably at this point we want to advance to position 5; this is not
> mentioned)

Fixed.


> this seems to lead to an infinite loop (IIRC the same thing happens for
> unquoted values). html5lib currently sidesteps the issue by not moving the
> position back one after finding an attribute.

Yeah, that was an error in the spec. Fixed. Let me know if by implementing 
the algorithm exactly as written now you still get an error.


> This fails to locate the character encoding in e.g.: <meta 
> http-equiv="Content-Type<meta charset="utf-8"> Obviously one possibility 
> is to get all attributes and then, if the current byte is ASCII < move 
> the position back one.

You shouldn't get the character encoding in that case.

-- 
Ian Hickson               U+1047E                )\._.,--....,'``.    fL
http://ln.hixie.ch/       U+263A                /,   _.. \   _\  ;`._ ,.
Things that are impossible just take longer.   `._.-(,_..'--(,_..'`-.;.'

Received on Friday, 2 March 2007 17:28:33 UTC