W3C home > Mailing lists > Public > public-script-coord@w3.org > January to March 2013

Re: E4H and constructing DOMs

From: Ian Hickson <ian@hixie.ch>
Date: Mon, 11 Mar 2013 19:38:01 +0000 (UTC)
To: Mike Samuel <mikesamuel@gmail.com>
cc: "public-script-coord@w3.org" <public-script-coord@w3.org>
Message-ID: <Pine.LNX.4.64.1303111929210.15713@ps20323.dreamhostps.com>
On Mon, 11 Mar 2013, Mike Samuel wrote:
> >
> > E4H is much simpler than E4X, actually:
> >
> >    http://www.hixie.ch/specs/e4h/strawman
> >
> > It's just a small syntax extension to JS. (It doesn't involve an HTML 
> > parser, in fact it doesn't involve any parser at all other than the JS 
> > parser, which is why it gives compile-time syntax checking.)
> How does it deal with XSS via CSS, URIs, VBScript, etc. without 
> involving parsers for those languages?
> What happens with
>     <><a href="{data}">Hello, World!</a></>
> when data is "javascript:doEvil()"?

Exactly what you expect, you get a JS link.

> What happens with
>     <><style>color: {data}</style></>
> when data is "expression(doEvil())"?

You get some invalid CSS.

> What happens with injection into a script?
>     <><script>var s = "{data}", re = /{data}/, x = {data};</script></>
> ?

Same as with an eval and string concatenation.

It's not magic. Magic is bad, especially around security features. Authors 
need to be able to understand the model precisely, and therefore it needs 
to be a simple model that they can easily reason about.

Autoescaping mechanisms are a disaster. Simple changes to the source code 
that look like no-ops end up introducing security vulnerabilities or 
breaking the logic because suddenly the autoescaper has different context. 
Authors end up not thinking about exactly what it is they're doing, 
leading to overconfidence and injection vulnerabilities where the 
autoescaper has no idea what's going on. Backwards-compatibility means 
that mistakes in the first release of the autoescaper can't be fixed 
without opt-in, which leads to a series of "yes I really want this to be 
secure" boilerplate after a few revisions. It's just way safer to be 
explicit and have a simple model.

If you want to inject a string into a regular expression, you know you 
have to escape the string for regexps and then insert it. If you want to 
insert a string A into a regular expression and then insert the regular 
expression into a CSS string and then insert the CSS string into a the 
query part of a URL, you know you have to escape the string A for regular 
expressions, then escape the regular expression for CSS strings, then 
escape the CSS string for the query part of URLs. No autoescaping 
mechanism can magically know what you're doing in cases like this.

Ian Hickson               U+1047E                )\._.,--....,'``.    fL
http://ln.hixie.ch/       U+263A                /,   _.. \   _\  ;`._ ,.
Things that are impossible just take longer.   `._.-(,_..'--(,_..'`-.;.'
Received on Monday, 11 March 2013 19:38:24 UTC

This archive was generated by hypermail 2.4.0 : Friday, 17 January 2020 17:14:08 UTC