[whatwg] More prohibited characters for unquoted attributes are needed

Ian Hickson wrote:
> On Mon, 7 Sep 2009, Aryeh Gregor wrote:
>> On Mon, Sep 7, 2009 at 1:34 PM, Geoffrey Sneddon
>> <foolistbar at googlemail.com> wrote:
>>> Apparently Hixie had previously said he didn't want to change this as it
>>> will become a non-issue over time. I think it does matter due to the
>>> security issues it presents in existing UAs. Conforming markup (using
>>> elements/attributes allowed in HTML 4.01) should not cause JS to execute in
>>> one browser but not in another.
>> I agree with you as an author.  I wrote an HTML output function in 
>> MediaWiki assuming that what the standard says is known to be 
>> interoperable, which is apparently wrong.  If I hadn't been keeping up 
>> with HTML 5, I would have introduced an XSS vulnerability because of 
>> some browsers' handling of `.
>> If the problem will go away with time, then perhaps a later version of 
>> the standard could make such unquoted attributes conforming, once 
>> there's no more problem with them.
> As far as I can tell, this is an IE bug; treating "`" as an attribute 
> quoting character is non-conforming in any version of HTML so far, it 
> seems. I'm certainly not going to make it non-conforming to stumble into 
> any IE bug or difference in parsing between IE and previous specs or other 
> browsers; we'd just end up with an asanine set of conformance 
> requirements.

I agree that it's pointless to make it non-conforming to hit any parsing 
bug, but I would argue that we should make as many cases as it is 
sensible to do so non-conforming if they open up security holes in 
websites on legacy UAs, given that website uses a HTML 5 

> For example, should this be non-conforming?
>    <!DOCTYPE html>
>    <title>Test</title>
>    <form>
>     <label>Search: <input type=text></label>
>     <input type=submit>
>    </form>
> This perfectly innocent piece of HTML content (HTML2-compliant except for 
> the DOCTYPE) results in a non-tree DOM in IE8. Should we make it 
> non-conforming?

No, it opens up no security hole if that is done.

> Similarly, IE conditional comments make it trivial to trigger scripts in 
> IE but not another UA; indeed people do this on purpose. Should we make 
> those non-conforming also?

They are a harder issue, but I think it is probably fair enough to 
assume that most sanitizers drop comments for such reasons, hence making 
them fine to leave as conforming also.

> As I understand it, the attack here is a site that allows the user to 
> input text that is used verbatim in two attributes, such that the user can 
> set the first attribute's value to:
>    `
> ...and the second to:
>    ` onload='...payload...' end=x
> ...with the assumption that the site is going to not quote the first one, 
> and quote the second one with double quotes:

(This is the default behaviour of Python html5lib, FWIW: the first is 
not quoted as it does not contain any whitespace characters or U+003E 
(>), the latter is quoted for that reason.)

>    <body title=` class="` onload='...payload...' end=x">
> ...which in IE, for some reason, gets treated as:
>    <body title=' class="'
>          onload='...payload...'
>          end='x"'>

Indeed, this is the attack I (and others) am concerned about.

> I've disallowed ` in unquoted attribute values for now, but I think we 
> should revert this once IE has fixed this bug for a few years.

Right, once versions of IE with this bug have faded out of existence I 
think this will become a non-issue. I also expect that'll be a while 
yet, though, and I highly doubt that time will have come even by the 
time when HTML 5 goes to REC. Furthermore, if there are similar attacks 
to this, I think they should similarly be made non-conforming.

Geoffrey Sneddon ? Opera Software

Received on Tuesday, 13 October 2009 07:02:24 UTC