Re: [Integrity] Some comments on Cross-Origin leakage and content types from Brad Hill on 2014-09-23 (public-webappsec@w3.org from September 2014)

From: Brad Hill <hillbrad@gmail.com>
Date: Tue, 23 Sep 2014 14:19:48 -0700
To: Arjan Veenstra <arjan@veenstra.cx>
Cc: "public-webappsec@w3.org" <public-webappsec@w3.org>
Message-ID: <CAEeYn8gZYct4GF-GGeMD499tyqPwtu4pG32t4yFz+XQnhAncpA@mail.gmail.com>
Arjan,

 I think our first principle here, especially as a security group, is
"do no harm".  We want to be conservative about experimenting with
hash-based caching because it has the potential to create security
issues with the current web security model.  So our first priority is
to take baby steps and be very safe in doing so.  If we find that SRI
is actually usable and that content-addressable-storage makes sense
and can be done securely even with a very limited set of use cases,
such as content that does not vary, and content that is
Access-Control-Allow-Origin *, then we can proceed in subsequent
versions of the spec to explore how we can improve it to cover more
use cases.  Starting with something that's more complex and risky than
necessary makes the spec less likely to succeed, even if it seems like
it makes it more useful.

-Brad

On Mon, Sep 22, 2014 at 12:59 PM, Arjan Veenstra <arjan@veenstra.cx> wrote:
> On 2014-09-22 06:24, Devdatta Akhawe wrote:
>>
>> Hi Arjan
>>
>> thanks for taking a look at the spec!
>> Forgive me, but I can't help but wonder: did you take a look at
>> Section 3.3.2
>>
>> http://w3c.github.io/webappsec/specs/subresourceintegrity/#is-resource-eligible-for-integrity-validation
>> [1]
>>
>> It defines restrictions on what SRI applies to and I think it is a
>> reasonably secure way of handling the issues you raise. Or are you
>> concerned despite the limits in the spec? Can you give a concrete
>> threat as an example, to help me understand better?
>
>
> What I'm refering to are the issues which may arise when using the integrity
> metadata as a cache key as described in Section 4. Suppose there is a script
> at https://bank.com/onlinebanking/accountinfo.js which is only include in
> pages which are only displayed to logged-in customers. This script may well
> be eligible for validation, as such it might also be added into the browser
> cache. Knowing this an attacker can create a webpage which attempts to load
> a script with the same hash and measure the time that takes to determine if
> the script existed in the victims cache. This will tell him which of his
> victims do online banking at bank.com. Allowing bank.com to basically
> opt-out from the proposed caching will solve this while keeping all other
> benefits of resource validation.
>
> Section 4.2 states that integrity metadata cannot be used as a cache
> identifier unless the resource is delivered with a * CORS header. This would
> mean that banking.com would be safe if they don't set CORS, or limit CORS to
> specific domains. However, this would basically mean only resources fetched
> from a CDN will ever enter the cache, which rather limits the benefits which
> could be gained from a hash based cache. A commonly used webapp (e.g.
> wordpress) could simply keep installing it's subresources locally but still
> get a very high cache probability simply by adding integrity metadata. On
> top off that commonly used scripts such as jquery won't be added to cache
> when they are not loaded through a CDN first. This means a page which uses a
> CDN may still need to load the script even though the browser has fetched
> the exact same script before.
>
> Regards,
> Arjan Veenstra
>
>
>> On 20 September 2014 01:19, Arjan Veenstra <arjan@veenstra.cx> wrote:
>>
>>> Hi,
>>>
>>> I've been looking at this proposal mostly interested of the
>>> improved caching of common resources it might provide, so my mindset
>>> might be tainted somewhat. But looking at the section 6.3 it
>>> occurred to me that most risk mentioned there could be mitigated if
>>> the document author could specify the intended usage of the
>>> resource. A simple 'private' marker which tells the UA it's not
>>> allowed to add the resource to it's hash-based cache could protect
>>> sensitive resources from these type of attacks. Or perhaps the spec
>>> should err on the save side and only allow caching when resources
>>> are marked public.
>>>
>>> Behind this is the assumption there are two types of resources you
>>> want integrity checks on. The first being common public resources
>>> such as javascript libraries, the second being resources specific to
>>> your application which are hosted elsewhere. Wider caching is mostly
>>> useful for resources in the first category, but the presence of
>>> those resource is unlikely to leak any usable information. As a
>>> common resource it could have entered the cache from lots of places.
>>> Resources in the second category generally won't benefit from
>>> caching beyond the currently available caching mechanism, marking
>>> those private won't hurt performance but does effectively remove any
>>> new attack surface introduced by hash based caching.
>>>
>>> Of course there are edge cases, for instance a library which is
>>> 'public' but not commonly used. Finding a cache hit might still give
>>> a lower certainty indication a user visited a specific site. But an
>>> additional flag would allow document authors to act according to
>>> their own assessment of the risks.
>>>
>>> I'm in doubt if a request for a resource marked private should be
>>> allowed to be fulfilled from cache. I'm guessing that if the hash is
>>> secure that shouldn't be an issue.
>>>
>>> I'm also missing a description of how to handle scenarios where a
>>> resource might be available in multiple content types. For instance,
>>> a server might prefer to serve an image as svg but fall back to
>>> serving a png file when the accept header doesn't include svg. I
>>> could see something similar happening in the future with alternative
>>> scripting (e.g. Dart, Coffeescript, Typescript) languages where a
>>> server might serve either the original script or the
>>> compiled-to-javascript version based on the accept header. In more
>>> abstract terms I'd say that since an URL points to a resource which
>>> might be represented in different ways you'll always have to account
>>> for the possibility a resource has different representations.
>>> I guess the obvious solution would be to allow different hashes
>>> with different content types to be specified. The spec doesn't seem
>>> to forbid this, but it doesn't explicitly allow it either and tends
>>> to speak about the content type in singular form.
>>>
>>> The same applies to localized resources, as the content of an
>>> Accept-Language header might cause different content to be served as
>>> well. Perhaps a language attribute needs to be added as well.
>>>
>>> Regards,
>>> Arjan Veenstra
>>
>>
>>
>>
>> Links:
>> ------
>> [1]
>>
>> http://w3c.github.io/webappsec/specs/subresourceintegrity/#is-resource-eligible-for-integrity-validation
>
>
Received on Tuesday, 23 September 2014 21:20:17 UTC