Re: Contextual auto-escaping corner cases from Mike Samuel on 2013-03-15 (public-script-coord@w3.org from January to March 2013)

From: Mike Samuel <mikesamuel@gmail.com>
Date: Fri, 15 Mar 2013 08:55:55 -0400
To: "Tab Atkins Jr." <jackalmage@gmail.com>
Cc: "public-script-coord@w3.org" <public-script-coord@w3.org>, Ian Hickson <ian@hixie.ch>
Message-ID: <CACod6Gvr6OHyxLnzLJSq-NwYU1O1-W5ot2_GHy2nPHm4sqW3Dg@mail.gmail.com>
2013/3/14 Tab Atkins Jr. <jackalmage@gmail.com>:
> On Thu, Mar 14, 2013 at 1:46 PM, Mike Samuel <mikesamuel@gmail.com> wrote:
>> 2013/3/12 Tab Atkins Jr. <jackalmage@gmail.com>:
>>> Ian provided several examples of code where it seems like it would be
>>> impossible to auto-escape properly, and an author relying on
>>> auto-escaping because they've been trained that it works elsewhere
>>> could be easily misled and inadvertently cause an XSS vulnerability.
>>> Could you go over those and answer how you think your ideas for
>>> auto-escaping would address the problems he raised?
>>
>> 2013/3/12 Ian Hickson <ian@hixie.ch>:
>>> What would be autoescaped in something like:
>>>
>>>    h`<img src="${scheme}://${host}:${port}/${path}/${file}.${ext}"
>>>          srcset="${file1} ${w1}w ${file2} ${w2}w"
>>>          alt="${alt}"
>>>          data-logger-url="logger?id=${id}&key=1234">
>>>
>>> ...? (where h`` is your autoescaper; obviously pretend that part is the
>>> done however your syntax would really work, and strip newlines if
>>> necessary, obviously.)
>>
>> The parts in the src are all URI encoded.  Any parts that appear after
>> a literal '?' or '#' are encoded so as to prevent parameter splitting.
>
> That implies that it's impossible to put in a url with ? or # in it, right?

Nope.

> It doesn't help the srcset at all, even though the browser knows that
> it accepts urls.

I wasn't aware that srcset was in the schema.  I'll have to update to
take that into account.


> Are you claiming that literal ? or # in the data-logger-url case cause
> parameter encoding?  Or were you referring solely to the src part, and
> the rest are completely unescaped?

With the heuristic that recognizes it.

>> In the closure-templates and Go versions, we have heuristics to let us
>> determine if custom attributes or data-* attributes are URL content.
>> This was based on an inspection of template code prior to the
>> introduction of contextual auto-escaping, and since Closure templates
>> are compiled statically it allows our pen-testers to keep a list of
>> known attributes that pass the heuristic and flush out new
>> non-standard attributes that don't.
>
> I doubt we want to put in heuristics for a standard escaper that looks
> for attribute values where the literal part "looks like" a url.  That
> sounds extremely scary, since a relatively small change in what parts
> of the url are contained in the literal segment could potentially make
> it stop recognizing.

Again, I'm not proposing standardizing anything, so I don't know who
"we" are.  Library authors can provide naming-convention heuristics as
a per-project option, and projects with a high security profile can
use pre-submit checks that flag custom elements or attribute names
that are outside their naming conventions.


>>> How about this:
>>>
>>>    x`<img width="${width}"
>>>           src="${profile.cgi?username=${username}&size=${width}}">
>>>      <script>
>>>       var x = new Image(${width});
>>>       x.src = 'profile.cgi?username=${username}&size=${width}';
>>>      </script>`;
>>
>> Quite.  We really need an intercession layer for the DOM that lets us
>> intercept assignments to sensitive properties and do late-binding of
>> escaper to templates.  Yay proxies.
>
> I don't think you understand this example properly.  The template
> creates the img *and* the script.  There's nothing there to late-bind.

Ah.  I thought the quotes around the x.src value where `...`.  In that
case, the proxy could default reject.


>>> How about:
>>>
>>>    x`<p>Paste this WLAML command: AB=2%\*2*11*22;GA=${GADATA}*41</p>`
>>
>> Social engineering will affect all technical solutions as shown in
>> this E4H template
>>
>> <>{x}</>
>>
>> with
>>
>> x = "Paste this into your URL bar : javascript:pwnMe()"
>
> I believe the point here was not social engineering, but to point out
> something that is thematically similar to a URL, and that thus might
> be expected by engineers to be as "safe" as a url is (not needing
> manual escaping), when that is actually insecure.  The "paste" part is
> irrelevant - just filler text in the example to introduce why there
> might be such a command put into page text.

I don't follow.  How does this lead to unintended side-effects or
data-leakage without user intervention?
Received on Friday, 15 March 2013 12:56:22 UTC