Re: Processing of &prod_id= in attributes from Adam Barth on 2010-06-29 (public-html@w3.org from June 2010)

From: Adam Barth <w3c@adambarth.com>
Date: Tue, 29 Jun 2010 13:39:22 -0700
To: Julian Reschke <julian.reschke@gmx.de>
Cc: Maciej Stachowiak <mjs@apple.com>, Henri Sivonen <hsivonen@iki.fi>, HTML WG <public-html@w3.org>
Message-ID: <AANLkTikrjR8BPnmARf0D1WQ_sHctKOTlWvw26C3-2qTC@mail.gmail.com>

On Tue, Jun 29, 2010 at 12:30 PM, Julian Reschke <julian.reschke@gmx.de> wrote:
> On 29.06.2010 21:03, Maciej Stachowiak wrote:
>> ...
>> I believe the spec matches Minefieled and the WebKit behavior is a bug.
>>
>> The algorithm you cited has this constraint:
>>
>> "If the character reference is being consumed as part of an attribute, and
>> the last character matched is not a U+003B SEMICOLON character (;), and the
>> next character is either a U+003D EQUALS SIGN character (=) or in the range
>> U+0030 DIGIT ZERO (0) to U+0039 DIGIT NINE (9), U+0041 LATIN CAPITAL LETTER
>> A to U+005A LATIN CAPITAL LETTER Z, or U+0061 LATIN SMALL LETTER A to U+007A
>> LATIN SMALL LETTER Z, then, for historical reasons, all the characters that
>> were matched after the U+0026 AMPERSAND character (&) must be unconsumed,
>> and nothing is returned."
>> ...
>
> Yikes.
>
> Can somebody translate this into English for me? :-)

Sure.

It's an unfortunate accident of the world that (1) & is part of the
escape sequence for HTML entities, (2) & is a common URL delimiter,
and (3) HTML attributes decode HTML entities.  Consequently, many
authors copy and paste & characters into HTML attributes as part of
URLs and don't expect the parser to decode HTML entities in their
URLs.  This algorithm in the spec catches those cases by not decoding
HTML entities if the character after the entity looks like it's more
likely to be part of a URL parameter name (or the parameter/value
delimiter, "=").

Adam

Received on Tuesday, 29 June 2010 20:40:18 UTC