W3C home > Mailing lists > Public > public-script-coord@w3.org > January to March 2013

Re: E4H and constructing DOMs

From: Mike Samuel <mikesamuel@gmail.com>
Date: Fri, 8 Mar 2013 20:40:31 -0500
Message-ID: <CACod6Gu3rqYt0gz0hL0JYYL4tGM-0=EYzZWRN5Jm0NMnOKxGDA@mail.gmail.com>
To: Bjoern Hoehrmann <derhoermi@gmx.net>
Cc: "public-script-coord@w3.org" <public-script-coord@w3.org>
2013/3/8 Bjoern Hoehrmann <derhoermi@gmx.net>:
> * Mike Samuel wrote:
>>I'm proposing a design that allows library authors (eventually grammar
>>maintainers) to write contextual auto-escaping systems instead of
>>requiring template system authors to write thousands of lines of AST
>>code that doesn't solve the problem because the DOM is wedded to
>>DOMstring for attribute values.
>
> I think it is critical that people are able to tell from looking at some
> code whether that code will perform as intended under all circumstances.
> Contextual auto-escaping systems do not seem to deliver that. Consider:
> On http://wiki.ecmascript.org/doku.php?id=harmony:quasis the example is
>
>   safehtml`<a href="${url}?q=${query}" ...
>
> which would generate with
>
>   url = "http://example.com/",
>   query = "Hello & Goodbye",
>
> the equivalent of
>
>   <a href="http://example.com/?q=Hello%20%26%20Goodbye" ...
>
> In other words, `safehtml` would implement some kind of "do what I mean"
> escaping system. I have no idea how `safehtml` would do that. The escape
> mode for the two variables is different, but based on what? Perhaps `=`
> causes the mode switch? Or maybe the `?` does? Perhaps the `?` does it
> only because `url` does not include a `?` itself? How can the `safehtml`
> tag know the `href` attribute attribute takes a URI to begin with? Is it
> based on the name `href`? Or maybe it knows the combination of `a` and
> `href`? So what if you have
>
>   safehtml`<a href="${url}?q=${query}" ...
>   safehtml`<x href="${url}?q=${query}" ...
>
> Same result? Might the result change over time, for instance, if "HTML"
> adds an `x` element with a `href` attribute that is a URI, so right now
> I get different results, but when `safehtml` is updated this changes? If
> the browser implements `safehtml` but not `x` but I use a "polyfill" to
> add some fallback support for the element, ... then what happens? And if
> browsers have built-in support for `safehtml` and I also use ecmascript
> in the server side to generate code and also use `safehtml` there, can I
> rely on `safehtml` working the same, while still expecting `safehtml` to
> do what I mean?
>
> If the code was something like
>
>   safehtml`<a href="${url:literal}?q=${query:uri_escape}" ...
>   safehtml`<x href="${url:literal}?q=${query:uri_escape}" ...
>
> I could be reasonably confident that I understand what it does, I might
> think that `safehtml` implements some HTML-like language and understands
> that `"` characters in `${url:literal}` need to be replaced by &...; re-
> ferences, and I can see how a single organisation like Google might be
> able to address some of the problems I've mentioned through deployment
> and other policies, but in the end I cannot tell whether `safehtml` tem-
> plates actually produce "safe" and "correct" results, without a lot of
> external data.

> A year or two ago I learned that Yair Amit reported a XSS vulnerability
> on google.com to Google in 2005. That was quite interesting because I'd
> not known that when http://www.websitedev.de/temp/google-utf7-xss.txt I
> reported another XSS vulnerability on the same page a couple of weeks
> later (initially no character encoding declared, then encoding set to
> US-ASCII while echoing non-7-bit user input). I am still not sure what
> to make of that, but given people screwing up like that, this contextual
> auto-escaping idea seems to be aiming too high, outside tight organiza-
> tional boundaries.

We've got a decent amount of experience with converting projects to
this now (mostly within the boundaries of Google) and it hasn't led to
floods of confusing queries.  The URL example you gave does turn some
heads every once in a while and seems to be the most confusing of the
bunch, and I may have chosen wrong there.
Yes, it does key off "?" as it is a goal to protect CGI parameter
boundaries even though CGI parameter injection is probably
high-hanging fruit, but encoding "?" within a path or authority part
would not break the web.

Outside organizational boundaries, the go html/template module seems
to do what people want without causing too much confusion.  The most
frequently asked question seems to be how to embed IE conditional
compilation comments to conditionally include stylesheets, which is
something that would be even more difficult with an AST approach since
conditional compilation comments essentially fork the grammar.
Received on Saturday, 9 March 2013 01:41:03 UTC

This archive was generated by hypermail 2.3.1 : Wednesday, 8 May 2013 19:30:09 UTC