[whatwg] More prohibited characters for unquoted attributes are needed

On Mon, 7 Sep 2009, Aryeh Gregor wrote:
> On Mon, Sep 7, 2009 at 1:34 PM, Geoffrey Sneddon
> <foolistbar at googlemail.com> wrote:
> > Apparently Hixie had previously said he didn't want to change this as it
> > will become a non-issue over time. I think it does matter due to the
> > security issues it presents in existing UAs. Conforming markup (using
> > elements/attributes allowed in HTML 4.01) should not cause JS to execute in
> > one browser but not in another.
> 
> I agree with you as an author.  I wrote an HTML output function in 
> MediaWiki assuming that what the standard says is known to be 
> interoperable, which is apparently wrong.  If I hadn't been keeping up 
> with HTML 5, I would have introduced an XSS vulnerability because of 
> some browsers' handling of `.
> 
> If the problem will go away with time, then perhaps a later version of 
> the standard could make such unquoted attributes conforming, once 
> there's no more problem with them.

As far as I can tell, this is an IE bug; treating "`" as an attribute 
quoting character is non-conforming in any version of HTML so far, it 
seems. I'm certainly not going to make it non-conforming to stumble into 
any IE bug or difference in parsing between IE and previous specs or other 
browsers; we'd just end up with an asanine set of conformance 
requirements. For example, should this be non-conforming?

   <!DOCTYPE html>
   <title>Test</title>
   <form>
    <label>Search: <input type=text></label>
    <input type=submit>
   </form>

This perfectly innocent piece of HTML content (HTML2-compliant except for 
the DOCTYPE) results in a non-tree DOM in IE8. Should we make it 
non-conforming?

Similarly, IE conditional comments make it trivial to trigger scripts in 
IE but not another UA; indeed people do this on purpose. Should we make 
those non-conforming also?


As I understand it, the attack here is a site that allows the user to 
input text that is used verbatim in two attributes, such that the user can 
set the first attribute's value to:

   `

...and the second to:

   ` onload='...payload...' end=x

...with the assumption that the site is going to not quote the first one, 
and quote the second one with double quotes:

   <body title=` class="` onload='...payload...' end=x">

...which in IE, for some reason, gets treated as:

   <body title=' class="'
         onload='...payload...'
         end='x"'>


I've disallowed ` in unquoted attribute values for now, but I think we 
should revert this once IE has fixed this bug for a few years.

-- 
Ian Hickson               U+1047E                )\._.,--....,'``.    fL
http://ln.hixie.ch/       U+263A                /,   _.. \   _\  ;`._ ,.
Things that are impossible just take longer.   `._.-(,_..'--(,_..'`-.;.'

Received on Sunday, 4 October 2009 19:32:12 UTC