- From: Maciej Stachowiak <mjs@apple.com>
- Date: Tue, 29 Jun 2010 12:03:15 -0700
- To: Adam Barth <w3c@adambarth.com>
- Cc: Henri Sivonen <hsivonen@iki.fi>, HTML WG <public-html@w3.org>
On Jun 29, 2010, at 11:29 AM, Adam Barth wrote: > Hi Henri, > > WebKit received a bug report [1] about URLs in attributes akin to the following: > > http://example.com/foo?bar=baz&prod_id=qux > > Recall that ∏ is an HTML entity. According to my reading of > <http://www.whatwg.org/specs/web-apps/current-work/multipage/tokenization.html#tokenizing-character-references>, > we ought to consume the "&prod" in an attribute as a named entity. > However, Minefield does not appear to process the input stream this > way: > > https://bug-41345-attachments.webkit.org/attachment.cgi?id=60037 > > Notice, however, that when we use £ in a similar way, Minefield > does consume "£" as a named entity: > > http://example.com/foo?bar=baz£_id=qux > > Is Minefield's behavior here intentional? Should we update the spec > to explain why these two entities are treated differently? Is my > interpretation of the spec incorrect? (Note that the legacy WebKit > parser acts the same way as Minefield on these test cases.) I believe the spec matches Minefieled and the WebKit behavior is a bug. The algorithm you cited has this constraint: "If the character reference is being consumed as part of an attribute, and the last character matched is not a U+003B SEMICOLON character (;), and the next character is either a U+003D EQUALS SIGN character (=) or in the range U+0030 DIGIT ZERO (0) to U+0039 DIGIT NINE (9), U+0041 LATIN CAPITAL LETTER A to U+005A LATIN CAPITAL LETTER Z, or U+0061 LATIN SMALL LETTER A to U+007A LATIN SMALL LETTER Z, then, for historical reasons, all the characters that were matched after the U+0026 AMPERSAND character (&) must be unconsumed, and nothing is returned." Regards, Maciej
Received on Tuesday, 29 June 2010 19:03:51 UTC