Re: Processing of &prod_id= in attributes from Maciej Stachowiak on 2010-06-29 (public-html@w3.org from June 2010)

From: Maciej Stachowiak <mjs@apple.com>
Date: Tue, 29 Jun 2010 12:03:15 -0700
To: Adam Barth <w3c@adambarth.com>
Cc: Henri Sivonen <hsivonen@iki.fi>, HTML WG <public-html@w3.org>
Message-id: <F4DFC4F5-4598-4899-895A-2EB6E52B79F6@apple.com>

On Jun 29, 2010, at 11:29 AM, Adam Barth wrote:

> Hi Henri,
> 
> WebKit received a bug report [1] about URLs in attributes akin to the following:
> 
> http://example.com/foo?bar=baz&prod_id=qux
> 
> Recall that &prod; is an HTML entity.  According to my reading of
> <http://www.whatwg.org/specs/web-apps/current-work/multipage/tokenization.html#tokenizing-character-references>,
> we ought to consume the "&prod" in an attribute as a named entity.
> However, Minefield does not appear to process the input stream this
> way:
> 
> https://bug-41345-attachments.webkit.org/attachment.cgi?id=60037
> 
> Notice, however, that when we use &pound; in a similar way, Minefield
> does consume "&pound" as a named entity:
> 
> http://example.com/foo?bar=baz&pound_id=qux
> 
> Is Minefield's behavior here intentional?  Should we update the spec
> to explain why these two entities are treated differently?  Is my
> interpretation of the spec incorrect?  (Note that the legacy WebKit
> parser acts the same way as Minefield on these test cases.)

I believe the spec matches Minefieled and the WebKit behavior is a bug.

The algorithm you cited has this constraint:

"If the character reference is being consumed as part of an attribute, and the last character matched is not a U+003B SEMICOLON character (;), and the next character is either a U+003D EQUALS SIGN character (=) or in the range U+0030 DIGIT ZERO (0) to U+0039 DIGIT NINE (9), U+0041 LATIN CAPITAL LETTER A to U+005A LATIN CAPITAL LETTER Z, or U+0061 LATIN SMALL LETTER A to U+007A LATIN SMALL LETTER Z, then, for historical reasons, all the characters that were matched after the U+0026 AMPERSAND character (&) must be unconsumed, and nothing is returned."

Regards,
Maciej

Received on Tuesday, 29 June 2010 19:03:51 UTC