Re: [Integrity] Some comments on Cross-Origin leakage and content types from Arjan Veenstra on 2014-09-22 (public-webappsec@w3.org from September 2014)

From: Arjan Veenstra <arjan@veenstra.cx>
Date: Mon, 22 Sep 2014 21:59:02 +0200
To: public-webappsec@w3.org
Message-ID: <5d7d590f53d1ef2f29d0c501b84ba523@d6.nl>
On 2014-09-22 06:24, Devdatta Akhawe wrote:
> Hi Arjan
> 
> thanks for taking a look at the spec!
> Forgive me, but I can't help but wonder: did you take a look at
> Section 3.3.2
> http://w3c.github.io/webappsec/specs/subresourceintegrity/#is-resource-eligible-for-integrity-validation
> [1]
> 
> It defines restrictions on what SRI applies to and I think it is a
> reasonably secure way of handling the issues you raise. Or are you
> concerned despite the limits in the spec? Can you give a concrete
> threat as an example, to help me understand better?

What I'm refering to are the issues which may arise when using the 
integrity metadata as a cache key as described in Section 4. Suppose 
there is a script at https://bank.com/onlinebanking/accountinfo.js which 
is only include in pages which are only displayed to logged-in 
customers. This script may well be eligible for validation, as such it 
might also be added into the browser cache. Knowing this an attacker can 
create a webpage which attempts to load a script with the same hash and 
measure the time that takes to determine if the script existed in the 
victims cache. This will tell him which of his victims do online banking 
at bank.com. Allowing bank.com to basically opt-out from the proposed 
caching will solve this while keeping all other benefits of resource 
validation.

Section 4.2 states that integrity metadata cannot be used as a cache 
identifier unless the resource is delivered with a * CORS header. This 
would mean that banking.com would be safe if they don't set CORS, or 
limit CORS to specific domains. However, this would basically mean only 
resources fetched from a CDN will ever enter the cache, which rather 
limits the benefits which could be gained from a hash based cache. A 
commonly used webapp (e.g. wordpress) could simply keep installing it's 
subresources locally but still get a very high cache probability simply 
by adding integrity metadata. On top off that commonly used scripts such 
as jquery won't be added to cache when they are not loaded through a CDN 
first. This means a page which uses a CDN may still need to load the 
script even though the browser has fetched the exact same script before.

Regards,
Arjan Veenstra


> On 20 September 2014 01:19, Arjan Veenstra <arjan@veenstra.cx> wrote:
> 
>> Hi,
>> 
>> I've been looking at this proposal mostly interested of the
>> improved caching of common resources it might provide, so my mindset
>> might be tainted somewhat. But looking at the section 6.3 it
>> occurred to me that most risk mentioned there could be mitigated if
>> the document author could specify the intended usage of the
>> resource. A simple 'private' marker which tells the UA it's not
>> allowed to add the resource to it's hash-based cache could protect
>> sensitive resources from these type of attacks. Or perhaps the spec
>> should err on the save side and only allow caching when resources
>> are marked public.
>> 
>> Behind this is the assumption there are two types of resources you
>> want integrity checks on. The first being common public resources
>> such as javascript libraries, the second being resources specific to
>> your application which are hosted elsewhere. Wider caching is mostly
>> useful for resources in the first category, but the presence of
>> those resource is unlikely to leak any usable information. As a
>> common resource it could have entered the cache from lots of places.
>> Resources in the second category generally won't benefit from
>> caching beyond the currently available caching mechanism, marking
>> those private won't hurt performance but does effectively remove any
>> new attack surface introduced by hash based caching.
>> 
>> Of course there are edge cases, for instance a library which is
>> 'public' but not commonly used. Finding a cache hit might still give
>> a lower certainty indication a user visited a specific site. But an
>> additional flag would allow document authors to act according to
>> their own assessment of the risks.
>> 
>> I'm in doubt if a request for a resource marked private should be
>> allowed to be fulfilled from cache. I'm guessing that if the hash is
>> secure that shouldn't be an issue.
>> 
>> I'm also missing a description of how to handle scenarios where a
>> resource might be available in multiple content types. For instance,
>> a server might prefer to serve an image as svg but fall back to
>> serving a png file when the accept header doesn't include svg. I
>> could see something similar happening in the future with alternative
>> scripting (e.g. Dart, Coffeescript, Typescript) languages where a
>> server might serve either the original script or the
>> compiled-to-javascript version based on the accept header. In more
>> abstract terms I'd say that since an URL points to a resource which
>> might be represented in different ways you'll always have to account
>> for the possibility a resource has different representations.
>> I guess the obvious solution would be to allow different hashes
>> with different content types to be specified. The spec doesn't seem
>> to forbid this, but it doesn't explicitly allow it either and tends
>> to speak about the content type in singular form.
>> 
>> The same applies to localized resources, as the content of an
>> Accept-Language header might cause different content to be served as
>> well. Perhaps a language attribute needs to be added as well.
>> 
>> Regards,
>> Arjan Veenstra
> 
> 
> 
> Links:
> ------
> [1]
> http://w3c.github.io/webappsec/specs/subresourceintegrity/#is-resource-eligible-for-integrity-validation
Received on Monday, 22 September 2014 19:59:27 UTC