Re: Whitelisting external resources by hash (was Re: Finalizing the shape of CSP ‘unsafe-dynamic’) from Artur Janc on 2016-06-08 (public-webappsec@w3.org from June 2016)

From: Artur Janc <aaj@google.com>
Date: Wed, 8 Jun 2016 22:58:44 +0200
To: Devdatta Akhawe <dev.akhawe@gmail.com>
Cc: Mike West <mkwst@google.com>, Brad Hill <hillbrad@gmail.com>, WebAppSec WG <public-webappsec@w3.org>, Christoph Kerschbaumer <ckerschbaumer@mozilla.com>, Daniel Bates <dabates@apple.com>, Devdatta Akhawe <dev@dropbox.com>
Message-ID: <CAPYVjqpvXcC18G-sWbjcdatt2HDiMx=TuPYJeVL32hwFFgz80A@mail.gmail.com>
On Wed, Jun 8, 2016 at 10:12 PM, Devdatta Akhawe <dev.akhawe@gmail.com>
wrote:

> Hi
>
> I am curious: whats the concern with hashing to whitelist an inline script
> that loads the external URI that we want? That with allow-dynamic will mean
> that the code works as expected? What am I missing?
>

There are two things that make this difficult:
1. It requires rewriting the page in a fairly non-trivial way to replace
<script src="foo.js"></script> with a wrapper inline script that loads
foo.js -- this is hard to automate in the general case.
2. Changing the way a script is loaded can affect its behavior; for
example, if the external script uses document.write() and we load it
asynchronously, the script might execute after the document is closed and
wipe out the DOM. Similarly, on a page with several scripts, we'd have to
guarantee execution in the original order, which is probably doable but
also not easy. Also, it likely breaks prefetching.

I think #2 might be solvable with some clever JS, but #1 is the real
obstacle if you're thinking of doing this at scale. If we didn't have to do
it then it's much easier for us to write tools to automate building a safe
policy with minimal developer cost.


> cheers
> Dev
>
>
>
> On 8 June 2016 at 07:04, Artur Janc <aaj@google.com> wrote:
>
>> On Wed, Jun 8, 2016 at 3:14 PM, Artur Janc <aaj@google.com> wrote:
>>
>>> On Tue, Jun 7, 2016 at 9:59 PM, Mike West <mkwst@google.com> wrote:
>>>
>>>> On Tue, Jun 7, 2016 at 8:27 PM, Artur Janc <aaj@google.com> wrote:
>>>>
>>>>> - You could whitelist specific URLs for script-src without risking
>>>>> redirect-based whitelist bypasses. For example `script-src 'self'
>>>>> ajax.googleapis.com/totally/safe.js` is an ineffective policy if
>>>>> there is an open redirect in 'self' due to the ability to load other
>>>>> scripts from ajax.googleapis.com caused by CSP's path-dropping
>>>>> behavior. A hash would avoid this problem.
>>>>>
>>>>
>>>> I think you might have something in mind other than just hashing the
>>>> URL? It's not clear to me how a different spelling of the URL would
>>>> mitigate the issues that lead to the path-dropping-after-redirect behavior.
>>>> Denying redirects entirely, perhaps?
>>>>
>>>
>>> Surprisingly, allowing hashes to bless script URLs would actually solve
>>> this problem. (For clarity, the strawman proposal I'm talking about is to
>>> allow the loading of external scripts if the digest of the `src' attribute
>>> is present as a hash in the policy: <script src="
>>> https://example.org/foo.js"></script> would be permitted by a CSP of
>>> script-src 'sha256-8wKZoJZ5SgqL4cU079oehMJ9lwrGSV9gBLjuY30aM3Q=').
>>>
>>
>> Something that I neglected to mention in the summary above is that this
>> would be a sufficient condition for loading the external script. That is,
>> no matter what redirects happen when https://example.org/foo.js is
>> requested, the script would be allowed to load if the digest of the
>> original URL (before any redirects) is present in the policy.
>>
>>
>>> The direct reason is that in the hash case it will be acceptable to
>>> follow any redirects when fetching the script, whereas for host-source the
>>> UA needs to check the location of any redirect to make sure it's present in
>>> the whitelist -- which is what enables revealing targets of redirects and
>>> leads to the privacy leak. Why is following redirects okay in the hash
>>> case, but not for host-source, you ask? Because by definition a hash allows
>>> only a single URL trusted by the developer and exactly matching the digest
>>> in the policy, so it is very unlikely to redirect to attacker-controlled
>>> data; for regular whitelists we couldn't do this because any open redirect
>>> in a whitelisted host-source would bypass CSP.
>>>
>>> Using hashes to whitelist specific script#src values sacrifices some
>>> flexibility (i.e. you'd have to explicitly hash every script URL on your
>>> page, rather than do it with a single host or path as with whitelists), but
>>> in exchange it solves the path-dropping problem. Since we're talking about
>>> this in the context of static content which might want to use
>>> {unsafe,allow}-dynamic, and where we can't use nonces, it would allow only
>>> expected scripts to be loaded without whitelist-related concerns.
>>>
>>> (An alternative, inferior answer to your question is that we could just
>>> ban redirects when fetching external scripts whitelisted by a hash because
>>> it's a new mechanism and we could enforce this without breaking existing
>>> users. I don't like it, but it's another way to handle this.)
>>>
>>>
>>>>
>>>>> - It would allow more flexibility in whitelisting exact script URLs.
>>>>> Using a traditional URL whitelist it's not possible to have a safe policy
>>>>> in an application which uses JSONP (script-src /api/jsonp can be abused by
>>>>> loading /api/jsonp?callback=evilFunction). With hashes you could allow
>>>>> SHA256("/api/jsonp?callback=goodFunction") but an attacker could not use
>>>>> such an interface to execute any other functions.
>>>>>
>>>>
>>>> Is hashing important here? Would extending the source expression syntax
>>>> to include query strings be enough?
>>>>
>>>
>>> Possibly, but now you'd have all the original concerns about revealing
>>> redirects, potentially with more worrying consequences if we support query
>>> strings; and if we keep using source expressions for this then the
>>> path-dropping behavior would remain a problem unless we handled that
>>> somehow.
>>>
>>>
>>>> - It would work with a policy based on 'unsafe-dynamic' /
>>>>> 'drop-whitelist' -- even if the host-source is dropped, the hash would
>>>>> offer a way to include specific external scripts.
>>>>>
>>>>> For CSP to become a useful XSS protection we will almost certainly
>>>>> have to move away from the whitelist-based model.
>>>>>
>>>>
>>>> I think we agree that Google will certainly need to move away from the
>>>> whitelist-based model. Though I agree with you that a nonce-based model is
>>>> simpler to deploy for many sites, GitHub seems to be a reasonable
>>>> counter-example to general necessity.
>>>>
>>>
>>> Without picking on GitHub, I would disagree with your counter-claim ;-)
>>> It is certainly possible to build CSP whitelists that will allow an
>>> application to function properly, but the overwhelming majority of such
>>> policies offer no benefit against XSS; I shared some data on the parallel
>>> thread and would be happy to share more if you'd like ;) The problem,
>>> however, is that this is fairly difficult to see because the failure mode
>>> of CSP with an unsafe whitelist is hidden -- only when someone attempts to
>>> exploit an XSS does it turn out that a policy was ineffective.
>>>
>>> It's not that I dislike the concept of whitelists, they made sense when
>>> they were proposed. It's just that we have convincing data -- both from
>>> Google applications and from a large survey of policies used in the wild --
>>> that in the current state of the Web they just aren't effective in
>>> practice. (In the comment above I meant "we" as in web application
>>> developers, rather than just Google.)
>>>
>>>
>>>>
>>>>> Dynamic applications can often use nonces instead, but for static
>>>>> content, or situations where using nonces would be difficult, I think
>>>>> hashes are a good solution -- one of their main benefits is that they're
>>>>> already in the spec and any expansion of their capabilities would be a
>>>>> relatively small change. (Another upside is that they can be used in a
>>>>> backwards-compatible way alongside a whitelist.)
>>>>>
>>>>
>>>> I still don't understand why hashing a URL is useful. :(
>>>>
>>>
>>> Here are the benefits I see:
>>> - We could handle the static content case with the current shape of
>>> 'unsafe-dynamic', without splitting it into separate keywords. A developer
>>> could set a script-src policy composed only of hashes (for inline and
>>> external scripts) and 'unsafe-dynamic', and the page would have a working,
>>> safe policy.
>>> - We would have a way to allow the loading of specific external URLs
>>> without nonces, and without risking path-dropping CSP bypasses; my guess is
>>> that this is simpler than adding query parameters to host-source.
>>> - It would not require any changes to the document and a policy could be
>>> built just by statically inspecting the markup -- a tool could parse the
>>> page, calculate digests of internal scripts and of the URLs of external
>>> ones.
>>> - Pages would be less likely to break than with the current SRI-based
>>> approach because this would allow the returned content to change; it would
>>> also handle the case of static pages loading non-static JS (various JS
>>> widgets and APIs). Of course it could still be used in combination with SRI
>>> if the developer so desires.
>>> - It is consistent with the behavior of nonces (which can whitelist both
>>> inline and external scripts), and conceptually it's easy to understand if
>>> you're used to the idea of hashes (calculate the digest of the script#src
>>> just like you'd do it for an inline script).
>>>
>>> An obvious drawback is that this is ugly and offers functionality that
>>> is similar to the existing whitelist; there are also some performance
>>> concerns raised by Dan related to the proliferation of policies with
>>> hashes. The question is about the cost/benefit ratio; I think in this case
>>> the benefits can be quite compelling...
>>>
>>> Cheers,
>>> -A
>>>
>>
>>
>
Received on Wednesday, 8 June 2016 20:59:33 UTC