Re: IE compat with coords parsing

I looked at the parsing algorithm that gecko uses and it seems
significantly different from what the spec proposes. What we do is
that we treat the string as a set of tokens separated by \s+|\s*,\s*.
We then parse each token using stdlib atoi, meaning that any digits
are read, optionally preceded by a '+' or a '-'. In other words we
don't give special treatment to ';' or any other character. However
our definition of whitespace in this algorithm is 0x09, 0x0A, 0x0B,
0x0C, 0x0D, 0x20.

Looks like it would parse any of the below cases like IE.

I looked through bugzilla to see if we had any bugs filed on us
related to coords parsing and found these

https://bugzilla.mozilla.org/show_bug.cgi?id=322370
coords=",1,2,3,4". IE seems to ignore the initial comma
This bug also mentions that IE pads with extra '0's if not enough
coordinates are found

https://bugzilla.mozilla.org/show_bug.cgi?id=440437
Just says that image maps don't work, doesn't contain any other information

There's also other bugs on that other aspects of image maps don't
work, such as wanting to treat unknown values for the 'shape'
attribute as 'rect', but I couldn't find anything else regarding the
coords attribute.

/ Jonas



On Fri, Jan 9, 2009 at 7:08 AM, Simon Pieters <simonp@opera.com> wrote:
>
> We implemented HTML5's parsing rules for coords=''. Hence I was interested
> in checking if the algorithm is compatible with IE, so I implemented it in
> javascript:
>
> http://simon.html5.org/tools/js/coords-parsing.html
>
> I found that HTML5 didn't quite match IE. For instance "1,2:3,4" gives "1"
> in HTML5 but "1,2,4" in IE (i.e. IE gives ':' the same treatment that the
> spec gives to '.').
>
> So I wondered which characters other than ':' that IE treated this way:
>
> http://software.hixie.ch/utilities/js/live-dom-viewer/?%3C!DOCTYPE%20html%3E%0D%0A%3Cmap%3E%3Carea%20shape%3D%22circle%22%3E%3C%2Fmap%3E%0D%0A%3Cscript%3E%0D%0Avar%20a%20%3D%20document.getElementsByTagName('area')%5B0%5D%3B%0D%0Afor%20(var%20i%20%3D%201%3B%20i%20%3C%200x3ff%3B%20%2B%2Bi)%20%7B%0D%0A%20%20a.coords%20%3D%20'1'%20%2B%20String.fromCharCode(i)%20%2B%20'2%2C3'%3B%0D%0A%20%20w('0x'%20%2B%20i.toString(16)%20%2B%20'%20('%20%2B%20String.fromCharCode(i)%20%2B%20')%3A%20'%20%2B%20a.coords)%3B%0D%0A%7D%0D%0A%3C%2Fscript%3E
>
> ...and it turns out to be quite a few (all lines that say "1,3,0").
>
> Then I asked Philip` for a some data, and he kindly gave a list:
>
>   http://philip.html5.org/data/coords-with-unusual-chars.txt
>
> ...from which I found three pages that would break with HTML5:
>
>   http://www.psu.edu/ur/GSpanier/gallery/
>   coords="157,5,233,20' href=" (apostrophe)
>
>   http://www.motorsportforbundet.no/
>   coords="615, 0, 768, 40      " (tab)
>
>   http://www.kipwmi.com/
>   coords="2,20', 87,38'" (apostrophe)
>
> It's possible that there are similar pages that contain coords='1,2",3,4"'
> or coords=1,2',3,4', but Philip` only looked at double-quoted attributes.
>
>
> On one hand it seems reasonable to say that 3 pages out of 130k isn't a big
> deal, but OTOH those pages work in today's browsers and changing the
> algorithm to fix them is simple.
>
> Doing exactly what IE does might not be necessary for Web compat. IE gives
> different treatment to various non-ASCII characters (such as 0x3f3 and
> 0x3f4).
>
> For the time being we'll keep the HTML5 implementation but we might change
> to be more compatible.
>
>
> (I note that Philip`'s data has an instance of
> COORDS="438,110,496,1ß2,496,156,438,159" which HTML5 and IE agree on
> although makes the link not work. If ß were to be treated as '.' then the
> page would work slightly better.)
>
> (See http://krijnhoetmer.nl/irc-logs/whatwg/20090109#l-352 for discussion.)
>
> --
> Simon Pieters
> Opera Software
>
>

Received on Friday, 9 January 2009 21:50:55 UTC