W3C home > Mailing lists > Public > public-webappsec@w3.org > September 2014

Re: [Integrity] Some comments on Cross-Origin leakage and content types

From: Devdatta Akhawe <dev.akhawe@gmail.com>
Date: Tue, 23 Sep 2014 15:15:05 -0700
Message-ID: <CAPfop_0OE2QPs5adn6yRzY+8Q5TQh56E5T=D2fhSZSd-wAV+zg@mail.gmail.com>
To: Arjan Veenstra <arjan@veenstra.cx>
Cc: "public-webappsec@w3.org" <public-webappsec@w3.org>
Aah -- so are you saying that

When bank.com uses SRI to fetch same origin resource with integrity
meta data, it shouldn't happen that tomorrow attacker.com should be
able to bruteforce the value because it is already in the
integrity-based cache? I don't think thats the intent of the spec, but
before I get into that; can you confirm this is your concern?

On 22 September 2014 12:59, Arjan Veenstra <arjan@veenstra.cx> wrote:
> On 2014-09-22 06:24, Devdatta Akhawe wrote:
>> Hi Arjan
>> thanks for taking a look at the spec!
>> Forgive me, but I can't help but wonder: did you take a look at
>> Section 3.3.2
>> http://w3c.github.io/webappsec/specs/subresourceintegrity/#is-resource-eligible-for-integrity-validation
>> [1]
>> It defines restrictions on what SRI applies to and I think it is a
>> reasonably secure way of handling the issues you raise. Or are you
>> concerned despite the limits in the spec? Can you give a concrete
>> threat as an example, to help me understand better?
> What I'm refering to are the issues which may arise when using the integrity metadata as a cache key as described in Section 4. Suppose there is a script at https://bank.com/onlinebanking/accountinfo.js which is only include in pages which are only displayed to logged-in customers. This script may well be eligible for validation, as such it might also be added into the browser cache. Knowing this an attacker can create a webpage which attempts to load a script with the same hash and measure the time that takes to determine if the script existed in the victims cache. This will tell him which of his victims do online banking at bank.com. Allowing bank.com to basically opt-out from the proposed caching will solve this while keeping all other benefits of resource validation.
> Section 4.2 states that integrity metadata cannot be used as a cache identifier unless the resource is delivered with a * CORS header. This would mean that banking.com would be safe if they don't set CORS, or limit CORS to specific domains. However, this would basically mean only resources fetched from a CDN will ever enter the cache, which rather limits the benefits which could be gained from a hash based cache. A commonly used webapp (e.g. wordpress) could simply keep installing it's subresources locally but still get a very high cache probability simply by adding integrity metadata. On top off that commonly used scripts such as jquery won't be added to cache when they are not loaded through a CDN first. This means a page which uses a CDN may still need to load the script even though the browser has fetched the exact same script before.
> Regards,
> Arjan Veenstra
>> On 20 September 2014 01:19, Arjan Veenstra <arjan@veenstra.cx> wrote:
>>> Hi,
>>> I've been looking at this proposal mostly interested of the
>>> improved caching of common resources it might provide, so my mindset
>>> might be tainted somewhat. But looking at the section 6.3 it
>>> occurred to me that most risk mentioned there could be mitigated if
>>> the document author could specify the intended usage of the
>>> resource. A simple 'private' marker which tells the UA it's not
>>> allowed to add the resource to it's hash-based cache could protect
>>> sensitive resources from these type of attacks. Or perhaps the spec
>>> should err on the save side and only allow caching when resources
>>> are marked public.
>>> Behind this is the assumption there are two types of resources you
>>> want integrity checks on. The first being common public resources
>>> such as javascript libraries, the second being resources specific to
>>> your application which are hosted elsewhere. Wider caching is mostly
>>> useful for resources in the first category, but the presence of
>>> those resource is unlikely to leak any usable information. As a
>>> common resource it could have entered the cache from lots of places.
>>> Resources in the second category generally won't benefit from
>>> caching beyond the currently available caching mechanism, marking
>>> those private won't hurt performance but does effectively remove any
>>> new attack surface introduced by hash based caching.
>>> Of course there are edge cases, for instance a library which is
>>> 'public' but not commonly used. Finding a cache hit might still give
>>> a lower certainty indication a user visited a specific site. But an
>>> additional flag would allow document authors to act according to
>>> their own assessment of the risks.
>>> I'm in doubt if a request for a resource marked private should be
>>> allowed to be fulfilled from cache. I'm guessing that if the hash is
>>> secure that shouldn't be an issue.
>>> I'm also missing a description of how to handle scenarios where a
>>> resource might be available in multiple content types. For instance,
>>> a server might prefer to serve an image as svg but fall back to
>>> serving a png file when the accept header doesn't include svg. I
>>> could see something similar happening in the future with alternative
>>> scripting (e.g. Dart, Coffeescript, Typescript) languages where a
>>> server might serve either the original script or the
>>> compiled-to-javascript version based on the accept header. In more
>>> abstract terms I'd say that since an URL points to a resource which
>>> might be represented in different ways you'll always have to account
>>> for the possibility a resource has different representations.
>>> I guess the obvious solution would be to allow different hashes
>>> with different content types to be specified. The spec doesn't seem
>>> to forbid this, but it doesn't explicitly allow it either and tends
>>> to speak about the content type in singular form.
>>> The same applies to localized resources, as the content of an
>>> Accept-Language header might cause different content to be served as
>>> well. Perhaps a language attribute needs to be added as well.
>>> Regards,
>>> Arjan Veenstra
>> Links:
>> ------
>> [1]
>> http://w3c.github.io/webappsec/specs/subresourceintegrity/#is-resource-eligible-for-integrity-validation
Received on Tuesday, 23 September 2014 22:15:56 UTC

This archive was generated by hypermail 2.4.0 : Friday, 17 January 2020 18:54:40 UTC