Re: E4H and constructing DOMs from Bjoern Hoehrmann on 2013-03-09 (public-script-coord@w3.org from January to March 2013)

From: Bjoern Hoehrmann <derhoermi@gmx.net>
Date: Sat, 09 Mar 2013 01:32:15 +0100
To: mikesamuel@gmail.com
Cc: "public-script-coord@w3.org" <public-script-coord@w3.org>
Message-ID: <f2tkj8dlhenunoq14ipov3vn4p7gabpr5s@hive.bjoern.hoehrmann.de>

* Mike Samuel wrote:
>I'm proposing a design that allows library authors (eventually grammar
>maintainers) to write contextual auto-escaping systems instead of
>requiring template system authors to write thousands of lines of AST
>code that doesn't solve the problem because the DOM is wedded to
>DOMstring for attribute values.

I think it is critical that people are able to tell from looking at some
code whether that code will perform as intended under all circumstances.
Contextual auto-escaping systems do not seem to deliver that. Consider:
On http://wiki.ecmascript.org/doku.php?id=harmony:quasis the example is

  safehtml`<a href="${url}?q=${query}" ...

which would generate with

  url = "http://example.com/",
  query = "Hello & Goodbye",

the equivalent of

  <a href="http://example.com/?q=Hello%20%26%20Goodbye" ...

In other words, `safehtml` would implement some kind of "do what I mean"
escaping system. I have no idea how `safehtml` would do that. The escape
mode for the two variables is different, but based on what? Perhaps `=`
causes the mode switch? Or maybe the `?` does? Perhaps the `?` does it
only because `url` does not include a `?` itself? How can the `safehtml`
tag know the `href` attribute attribute takes a URI to begin with? Is it
based on the name `href`? Or maybe it knows the combination of `a` and
`href`? So what if you have

  safehtml`<a href="${url}?q=${query}" ...
  safehtml`<x href="${url}?q=${query}" ...

Same result? Might the result change over time, for instance, if "HTML"
adds an `x` element with a `href` attribute that is a URI, so right now
I get different results, but when `safehtml` is updated this changes? If
the browser implements `safehtml` but not `x` but I use a "polyfill" to
add some fallback support for the element, ... then what happens? And if
browsers have built-in support for `safehtml` and I also use ecmascript
in the server side to generate code and also use `safehtml` there, can I
rely on `safehtml` working the same, while still expecting `safehtml` to
do what I mean?

If the code was something like

  safehtml`<a href="${url:literal}?q=${query:uri_escape}" ...
  safehtml`<x href="${url:literal}?q=${query:uri_escape}" ...

I could be reasonably confident that I understand what it does, I might
think that `safehtml` implements some HTML-like language and understands
that `"` characters in `${url:literal}` need to be replaced by &...; re-
ferences, and I can see how a single organisation like Google might be
able to address some of the problems I've mentioned through deployment
and other policies, but in the end I cannot tell whether `safehtml` tem-
plates actually produce "safe" and "correct" results, without a lot of
external data.

A year or two ago I learned that Yair Amit reported a XSS vulnerability
on google.com to Google in 2005. That was quite interesting because I'd
not known that when http://www.websitedev.de/temp/google-utf7-xss.txt I
reported another XSS vulnerability on the same page a couple of weeks
later (initially no character encoding declared, then encoding set to
US-ASCII while echoing non-7-bit user input). I am still not sure what
to make of that, but given people screwing up like that, this contextual
auto-escaping idea seems to be aiming too high, outside tight organiza-
tional boundaries.
-- 
Björn Höhrmann · mailto:bjoern@hoehrmann.de · http://bjoern.hoehrmann.de
Am Badedeich 7 · Telefon: +49(0)160/4415681 · http://www.bjoernsworld.de
25899 Dagebüll · PGP Pub. KeyID: 0xA4357E78 · http://www.websitedev.de/

Received on Saturday, 9 March 2013 00:32:45 UTC