Re: Whitelisting external resources by hash (was Re: Finalizing the shape of CSP ‘unsafe-dynamic’) from Artur Janc on 2016-06-08 (public-webappsec@w3.org from June 2016)

From: Artur Janc <aaj@google.com>
Date: Wed, 8 Jun 2016 15:14:14 +0200
To: Mike West <mkwst@google.com>
Cc: Brad Hill <hillbrad@gmail.com>, Devdatta Akhawe <dev.akhawe@gmail.com>, WebAppSec WG <public-webappsec@w3.org>, Christoph Kerschbaumer <ckerschbaumer@mozilla.com>, Daniel Bates <dabates@apple.com>, Devdatta Akhawe <dev@dropbox.com>
Message-ID: <CAPYVjqoznbhdx0EdKgqK8W6j6oPiRZsw2V3QMwsJkduA7fiv8A@mail.gmail.com>
On Tue, Jun 7, 2016 at 9:59 PM, Mike West <mkwst@google.com> wrote:

> On Tue, Jun 7, 2016 at 8:27 PM, Artur Janc <aaj@google.com> wrote:
>
>> - You could whitelist specific URLs for script-src without risking
>> redirect-based whitelist bypasses. For example `script-src 'self'
>> ajax.googleapis.com/totally/safe.js` is an ineffective policy if there
>> is an open redirect in 'self' due to the ability to load other scripts from
>> ajax.googleapis.com caused by CSP's path-dropping behavior. A hash would
>> avoid this problem.
>>
>
> I think you might have something in mind other than just hashing the URL?
> It's not clear to me how a different spelling of the URL would mitigate the
> issues that lead to the path-dropping-after-redirect behavior. Denying
> redirects entirely, perhaps?
>

Surprisingly, allowing hashes to bless script URLs would actually solve
this problem. (For clarity, the strawman proposal I'm talking about is to
allow the loading of external scripts if the digest of the `src' attribute
is present as a hash in the policy: <script
src="https://example.org/foo.js"></script>
would be permitted by a CSP of script-src
'sha256-8wKZoJZ5SgqL4cU079oehMJ9lwrGSV9gBLjuY30aM3Q=').

The direct reason is that in the hash case it will be acceptable to follow
any redirects when fetching the script, whereas for host-source the UA
needs to check the location of any redirect to make sure it's present in
the whitelist -- which is what enables revealing targets of redirects and
leads to the privacy leak. Why is following redirects okay in the hash
case, but not for host-source, you ask? Because by definition a hash allows
only a single URL trusted by the developer and exactly matching the digest
in the policy, so it is very unlikely to redirect to attacker-controlled
data; for regular whitelists we couldn't do this because any open redirect
in a whitelisted host-source would bypass CSP.

Using hashes to whitelist specific script#src values sacrifices some
flexibility (i.e. you'd have to explicitly hash every script URL on your
page, rather than do it with a single host or path as with whitelists), but
in exchange it solves the path-dropping problem. Since we're talking about
this in the context of static content which might want to use
{unsafe,allow}-dynamic, and where we can't use nonces, it would allow only
expected scripts to be loaded without whitelist-related concerns.

(An alternative, inferior answer to your question is that we could just ban
redirects when fetching external scripts whitelisted by a hash because it's
a new mechanism and we could enforce this without breaking existing users.
I don't like it, but it's another way to handle this.)


>
>> - It would allow more flexibility in whitelisting exact script URLs.
>> Using a traditional URL whitelist it's not possible to have a safe policy
>> in an application which uses JSONP (script-src /api/jsonp can be abused by
>> loading /api/jsonp?callback=evilFunction). With hashes you could allow
>> SHA256("/api/jsonp?callback=goodFunction") but an attacker could not use
>> such an interface to execute any other functions.
>>
>
> Is hashing important here? Would extending the source expression syntax to
> include query strings be enough?
>

Possibly, but now you'd have all the original concerns about revealing
redirects, potentially with more worrying consequences if we support query
strings; and if we keep using source expressions for this then the
path-dropping behavior would remain a problem unless we handled that
somehow.


> - It would work with a policy based on 'unsafe-dynamic' / 'drop-whitelist'
>> -- even if the host-source is dropped, the hash would offer a way to
>> include specific external scripts.
>>
>> For CSP to become a useful XSS protection we will almost certainly have
>> to move away from the whitelist-based model.
>>
>
> I think we agree that Google will certainly need to move away from the
> whitelist-based model. Though I agree with you that a nonce-based model is
> simpler to deploy for many sites, GitHub seems to be a reasonable
> counter-example to general necessity.
>

Without picking on GitHub, I would disagree with your counter-claim ;-) It
is certainly possible to build CSP whitelists that will allow an
application to function properly, but the overwhelming majority of such
policies offer no benefit against XSS; I shared some data on the parallel
thread and would be happy to share more if you'd like ;) The problem,
however, is that this is fairly difficult to see because the failure mode
of CSP with an unsafe whitelist is hidden -- only when someone attempts to
exploit an XSS does it turn out that a policy was ineffective.

It's not that I dislike the concept of whitelists, they made sense when
they were proposed. It's just that we have convincing data -- both from
Google applications and from a large survey of policies used in the wild --
that in the current state of the Web they just aren't effective in
practice. (In the comment above I meant "we" as in web application
developers, rather than just Google.)


>
>> Dynamic applications can often use nonces instead, but for static
>> content, or situations where using nonces would be difficult, I think
>> hashes are a good solution -- one of their main benefits is that they're
>> already in the spec and any expansion of their capabilities would be a
>> relatively small change. (Another upside is that they can be used in a
>> backwards-compatible way alongside a whitelist.)
>>
>
> I still don't understand why hashing a URL is useful. :(
>

Here are the benefits I see:
- We could handle the static content case with the current shape of
'unsafe-dynamic', without splitting it into separate keywords. A developer
could set a script-src policy composed only of hashes (for inline and
external scripts) and 'unsafe-dynamic', and the page would have a working,
safe policy.
- We would have a way to allow the loading of specific external URLs
without nonces, and without risking path-dropping CSP bypasses; my guess is
that this is simpler than adding query parameters to host-source.
- It would not require any changes to the document and a policy could be
built just by statically inspecting the markup -- a tool could parse the
page, calculate digests of internal scripts and of the URLs of external
ones.
- Pages would be less likely to break than with the current SRI-based
approach because this would allow the returned content to change; it would
also handle the case of static pages loading non-static JS (various JS
widgets and APIs). Of course it could still be used in combination with SRI
if the developer so desires.
- It is consistent with the behavior of nonces (which can whitelist both
inline and external scripts), and conceptually it's easy to understand if
you're used to the idea of hashes (calculate the digest of the script#src
just like you'd do it for an inline script).

An obvious drawback is that this is ugly and offers functionality that is
similar to the existing whitelist; there are also some performance concerns
raised by Dan related to the proliferation of policies with hashes. The
question is about the cost/benefit ratio; I think in this case the benefits
can be quite compelling...

Cheers,
-A
Received on Wednesday, 8 June 2016 13:15:02 UTC