W3C home > Mailing lists > Public > public-script-coord@w3.org > January to March 2013

Re: E4H and constructing DOMs

From: Mike Samuel <mikesamuel@gmail.com>
Date: Tue, 12 Mar 2013 19:11:02 -0400
Message-ID: <CACod6GtN1TivcWWSji-0d=EYm-eLf=8c7n+Euugh_22bntLM0g@mail.gmail.com>
To: Ian Hickson <ian@hixie.ch>
Cc: "public-script-coord@w3.org" <public-script-coord@w3.org>
2013/3/12 Ian Hickson <ian@hixie.ch>:
> On Mon, 11 Mar 2013, Mike Samuel wrote:
>> 2013/3/11 Ian Hickson <ian@hixie.ch>:
>> > On Mon, 11 Mar 2013, Mike Samuel wrote:
>> >>
>> >> Ok.  So it's not a goal of E4H to be safe against XSS by default
>> >> then.
>> >
>> > Autoescaping isn't safe by default either, by that definition.
>>
>> URLs are kind of a large hole, and, yes, contextual auto=escaping is
>> safe by that definition.
>
> What would be autoescaped in something like:
>
>    h`<img src="${scheme}://${host}:${port}/${path}/${file}.${ext}"
>          srcset="${file1} ${w1}w ${file2} ${w2}w"
>          alt="${alt}"
>          data-logger-url="logger?id=${id}&key=1234">
>
> ...? (where h`` is your autoescaper; obviously pretend that part is the
> done however your syntax would really work, and strip newlines if
> necessary, obviously.)
>
> Or this:
>
>    x`<div style="color: ${colorModeA}"
>           data-style-mode-a="color: ${colorModeA}"
>           data-style-mode-b="color: ${colorModeB}"
>           data-style-mode-c="color: ${colorModeC}"></div>`
>
> ...where script switches in the new style="" attribute values dynamically
> based on e.g. some game state?
>
> How about this:
>
>    x`<img width="${width}"
>           src="${profile.cgi?username=${username}&size=${width}}">
>      <script>
>       var x = new Image(${width});
>       x.src = 'profile.cgi?username=${username}&size=${width}';
>      </script>`;
>
> How about:
>
>    x`<p>Paste this WLAML command: AB=2%\*2*11*22;GA=${GADATA}*41</p>`
>
> The utter lack of escaping in the cases above should set off alarm bells,
> but to authors who have been desensitised due to autoescaping, it'll look
> perfectly safe and we'll have a bunch of XSS (or other injection bugs, as
> in the last case) on our hands.

https://developers.google.com/closure/templates/docs/security#in_urls
gives, in tabular form, examples of contexts and the escaping
conventions used.

You can try out examples in one of the testbeds.

https://js-quasis-libraries-and-repl.googlecode.com/svn/trunk/index.html
for the proof-of-concept JS implementation.
http://go-htmltemplate.appspot.com/ for the Go implementation.
http://java-html-escaper.appspot.com/



>> What do you and Adam mean by "safe" when you say "safe by default"?
>
> I was just using it in the way that you used it. I would be fine with not
> using the term at all.

It's quite possible that we have been talking past one another because
we have very different attack models in mind.
If you think we're using the same definition, what do you think I mean
by "safe"?


>> > E4H's design goals were:
>> >
>> >  - to provide compile-time syntax checking for in-script DOM tree creation
>>
>> A laudable goal.
>> Contextual auto-escapers provide some level of this.
>
> I haven't seen any proposal that requires browsers to fail to compile code
> that contains syntactically incorrect fragments. Do you have an example of
> what you mean? Which proposal does that?

I'm not proposing a contextual autoescaper specification.  I am merely
proposing string templates which enable, among other things, easy
integration of contextually auto-escaped applications.

I think specifying a blessed template language is premature so I think
enabling experimentation is the best course for now.

>> > [...]
>> >     * avoid using the HTML parser
>>
>> I understand the first two goals.  The last seems to be confusing a
>> design choice with a design goal since not using an available tool is
>> rarely something of direct benefit to the end user.
>
> The HTML parser is an utter disaster. It's slow, it's big, it's
> ridiculously complicated. It does stuff you'd never guess at without an
> intimate knowledge of the requirements. Using it is not a feature.

One of the things that I've been building into my grammar driven
approach is to allow explicit marking of ways to coerce content
written using grammatical corner cases to the subset of the grammar
that is consistently well-handled -- so the template language
automatically adds quotes around unquoted attributes, end tags, etc.

It should be possible for a template language to allow (but warn) on
messy input but to produce strings that are both valid XML (modulo
doctypes, raw text content, and HTML character references) and valid
HTML.  I don't yet have good test-coverage or experience with this.


>> >  - to have good security characteristics:
>> >     * provide a model that is conceptually simple
>> >     * allow arbitrary strings to be embedded in DOM trees in a way that
>> >       does not allow arbitrary elements or attributes to be created
>>
>> If even
>>     <a href="{...}">
>> is a foot gun then I think it fails at this goal.
>
> Which goal does it fail? The model is simple, and you can't create
> arbitrary elements or attributes. Obviously if you're inserting a string
> into a context where it will be parsed, you have to make sure it's valid
> data, but whitelisting like that is elementary, and applies in all cases,
> including many where there's just no way you could autoescape because the
> data/syntax you're inserting into is app-specific.

It fails to preserve the "code-effect property" as outlined at
https://js-quasis-libraries-and-repl.googlecode.com/svn/trunk/safetemplate.html#code_effect_property

As a security person, it's my job to understand the bad-parts of
language grammars, but I haven't succeeded as a tool/library author
unless I present an interface that lets my clients write correct,
secure programs without understanding the problem in as much detail as
I do.

Requiring them to understand what is "safe" Javascript/CSS/URI and all
the minor contextual details that affect choice of
escapers/sanitizers/filters is failing to present a simple interface.
Received on Tuesday, 12 March 2013 23:11:33 UTC

This archive was generated by hypermail 2.3.1 : Wednesday, 8 May 2013 19:30:09 UTC