Re: E4H and constructing DOMs from Mike Samuel on 2013-03-13 (public-script-coord@w3.org from January to March 2013)

From: Mike Samuel <mikesamuel@gmail.com>
Date: Tue, 12 Mar 2013 20:07:19 -0400
To: Ian Hickson <ian@hixie.ch>
Cc: "public-script-coord@w3.org" <public-script-coord@w3.org>
Message-ID: <CACod6GvbHVp+1qdgGj4RAS5Tb68zxReyrVoy4U0M5KFVwnybyg@mail.gmail.com>
2013/3/12 Ian Hickson <ian@hixie.ch>:
> On Tue, 12 Mar 2013, Mike Samuel wrote:
>> >
>> > What would be autoescaped in something like:
>> >
>> >    h`<img src="${scheme}://${host}:${port}/${path}/${file}.${ext}"
>> >          srcset="${file1} ${w1}w ${file2} ${w2}w"
>> >          alt="${alt}"
>> >          data-logger-url="logger?id=${id}&key=1234">
>> >
>> > Or this:
>> >
>> >    x`<div style="color: ${colorModeA}"
>> >           data-style-mode-a="color: ${colorModeA}"
>> >           data-style-mode-b="color: ${colorModeB}"
>> >           data-style-mode-c="color: ${colorModeC}"></div>`
>> >
>> > ...where script switches in the new style="" attribute values dynamically
>> > based on e.g. some game state?
>> >
>> > How about this:
>> >
>> >    x`<img width="${width}"
>> >           src="${profile.cgi?username=${username}&size=${width}}">
>> >      <script>
>> >       var x = new Image(${width});
>> >       x.src = 'profile.cgi?username=${username}&size=${width}';
>> >      </script>`;
>> >
>> > How about:
>> >
>> >    x`<p>Paste this WLAML command: AB=2%\*2*11*22;GA=${GADATA}*41</p>`
>>
>> https://developers.google.com/closure/templates/docs/security#in_urls
>> gives, in tabular form, examples of contexts and the escaping
>> conventions used.
>>
>> You can try out examples in one of the testbeds.
>
> The answer seems to be "they are all a disaster". They paper over some of
> the mistakes, corrupts the results for some of the others, and mislead
> authors into thinking the remainder are safe.

Some specifics would be nice instead of more vague FUD.


>> >> What do you and Adam mean by "safe" when you say "safe by default"?
>> >
>> > I was just using it in the way that you used it. I would be fine with
>> > not using the term at all.
>>
>> It's quite possible that we have been talking past one another because
>> we have very different attack models in mind. If you think we're using
>> the same definition, what do you think I mean by "safe"?
>
> I'll let you define your own terms.
>
> What I care about is having APIs that are predictable, understandable, and
> simple, where it is straight-forward to use them in a manner that exhibits
> good coding practices.

You're confusing simplicity of interface with simplicity of
implementation, but even if you weren't, E4H fails at this because it
fails to extend its the simplicity and predictability it provides for
HTML to embedded languages meaning its users have to have a deep
understanding of the security consequences of interpolation into URIs,
scripts, and styles when using interpolation in those contexts, and it
provides a false sense of security because it does this for HTML but
not embedded languages and because you advertise it as "safe" without
defining terms.

>> >>> provide compile-time syntax checking for in-script DOM tree creation
>> >>
>> >> Contextual auto-escapers provide some level of this.
>> >
>> > I haven't seen any proposal that requires browsers to fail to compile
>> > code that contains syntactically incorrect fragments. Do you have an
>> > example of what you mean? Which proposal does that?
>>
>> I'm not proposing a contextual autoescaper specification.
>
> Then your solution doesn't provide this.

>> I am merely proposing string templates which enable, among other things,
>> easy integration of contextually auto-escaped applications.
>
> Not compile-time checked ones, right?

No.  EcmaScript isn't compiled, so I don't know where compile-time
checks would go.


>> I think specifying a blessed template language is premature so I think
>> enabling experimentation is the best course for now.
>
> I think we've done a lot of experimentation already.

Yet we can't even agree on whether we agree on the proper security posture.


>> One of the things that I've been building into my grammar driven
>> approach is to allow explicit marking of ways to coerce content written
>> using grammatical corner cases to the subset of the grammar that is
>> consistently well-handled -- so the template language automatically adds
>> quotes around unquoted attributes, end tags, etc.
>>
>> It should be possible for a template language to allow (but warn) on
>> messy input but to produce strings that are both valid XML (modulo
>> doctypes, raw text content, and HTML character references) and valid
>> HTML.  I don't yet have good test-coverage or experience with this.
>
> I don't understand what you're saying here. The "corner cases" in HTML
> aren't "unquoted attributes" and so forth. They're crazy things like
> "<image>" create an "img" element, "</br><br>" creates two "br" elements,
> "</p><p>" creates more "p" elements than "<p></p>", "<table><input>"
> creates two siblings while "<table><input type=hidden>" creates a
> parent/child relationship, "<isindex prompt>" creates half a dozen nodes
> including multiple text nodes and elements, though none of them with the
> tag name "isindex", and the output even has an attribute, though it's not
> called "prompt", "<script>" having all kinds of wacky interactions with
> the event loop, crazy things happening with association of form controls
> to form elements, elements being literally moved in the DOM as the DOM is
> created... the list of crazy behaviours is long and esoteric.

HTML attribute quoting is a source of subtle XSS vulnerabilities, so
unquoted and backtick-quoted attributes are a corner case.

If E4H s advertised as embedded HTML, but <><isindex></> compiles and
behaves markedly differently than the equivalent HTML fragment, then
doesn't it fail your understandability requirement.


>> >> >  - to have good security characteristics:
>> >> >     * provide a model that is conceptually simple
>> >> >     * allow arbitrary strings to be embedded in DOM trees in a way that
>> >> >       does not allow arbitrary elements or attributes to be created
>> >>
>> >> If even
>> >>     <a href="{...}">
>> >> is a foot gun then I think it fails at this goal.
>> >
>> > Which goal does it fail? The model is simple, and you can't create
>> > arbitrary elements or attributes. Obviously if you're inserting a string
>> > into a context where it will be parsed, you have to make sure it's valid
>> > data, but whitelisting like that is elementary, and applies in all cases,
>> > including many where there's just no way you could autoescape because the
>> > data/syntax you're inserting into is app-specific.
>>
>> It fails to preserve the "code-effect property" as outlined at
>> https://js-quasis-libraries-and-repl.googlecode.com/svn/trunk/safetemplate.html#code_effect_property
>
> You said it fails at "this" goal, citing the goals above. They don't
> include preserving the code-effect property. I don't think it's a good
> idea to preserve that, as that essentially boils down to magic.

Again confusing simplicity of interface and simplicity of implementation.

> The way you get secure code is by having authors understand exactly what
> is happening with their data, and having the authors make sure that they
> follow a disciple where they think of their data as being in specific

Not true.   Human discipline works when the severity of a breach
depends on the number of failures of discipline, but when one failure
is markedly more severe than zero and roughly the same as two, then
you need to make lapses in discipline impossible.

> types, and when they use the data, they first convert it to be a valid
> value to insert into the type of whatever they are inserting the data
> into. E4H makes this easy: the context of any substitution is "text of the
> type that is appropriate in this place in the DOM", which is a net
> improvement over string concatenation where the context is "markup at this
> place in HTML syntax", a significantly more complicated type to reason
> about. This is why it is a better solution to creating DOMs than solutions
> that rely on string concatenation followed by application of the HTML
> parser, which is the only way these days to do DOM creation tersely.



> (Essentially, auto-escaping almost by definition fails what I think is one
> of the most important requirements of any API, in terms of security, which
> is the "Least Surprise Property" as you call it on that page above.)

I'm not just engaging in armchair philosophy about something that I've
speced and have yet to deploy.  I have real-world experience with
converting large projects to use this that proves you wrong.
Received on Wednesday, 13 March 2013 00:07:50 UTC