Re: [integrity] Different ways to associate integrity information from Mark Nottingham on 2014-10-27 (public-webappsec@w3.org from October 2014)

From: Mark Nottingham <mnot@mnot.net>
Date: Mon, 27 Oct 2014 13:43:34 -0700
To: Brad Hill <hillbrad@gmail.com>
Cc: Mike West <mkwst@google.com>, WebAppSec WG <public-webappsec@w3.org>
Message-Id: <FAF4A63A-1E9E-4E0A-AA3B-D2F5A611475F@mnot.net>
> On 24 Oct 2014, at 10:40 am, Brad Hill <hillbrad@gmail.com> wrote:
> 
> Mike: I think Mark is imaging a header delivered by the parent resource.

Yes. I’m pointing out, in a roundabout way, that the use cases for a hash of the contents of a link are likely to be diverse, and that that constraints we put on the first (I won’t call it “primary”) use — integrity of third-party resources — may not make sense for other uses.


> Mark: At that point, why not use a manifest
> (https://w3c.github.io/manifest/) and find a way to add hashes there?
> It seems that what you'd have to deliver over a header is pretty close
> to that information set.

Yep, something like that would work...


> I've been trying to constrain the group's ambitions here (and Mike,
> Joel, Dev and Freddy have been good at further constraining my own) on
> this sort of thing.  I want to see if this approach is at all
> interesting to a meaningful set of web publishers, and if it is
> manageable by them.  Certainly SRI does introduce fragility.
> 
> I wonder if anyone really has the operational capacity to manage a
> manifest such as Mark proposes and still meaningfully vet that the
> hashes are for authentic content?  A publisher could scrape their
> subresources and update automatically, but then the protection
> devolves to individually targeted attacks on the network, which HTTPS
> should address - any malicious changes at the origin server, e.g. if
> it was compromised, would be automatically propagated into the
> manifest by such tooling.

Well, that’s the thing. Content-addressable caching is a very different use case; it’s not about assuring authentic content, it’s an optimisation. In the case where you don’t get a cache hit on CAN, you simply follow the link to the origin and probably *don’t* display an error if the hash still doesn’t match; that’s the most backwards-compatible, sane thing to do. However, that pretty fundamentally disagrees with the first use case for SRI.

E.g., an intermediary (e.g., a CDN) could scrape content and create manifests to enable CAN for referenced content; this would be of significant value for performance, but of course would have zero value for end-to-end integrity.

This makes me wonder whether CAN is really something separate from SRI (as much as I’d like to see it get some traction).

Cheers,


> So I think as a first step, it's best to address the easiest and
> highest possible value cases first - large JS libraries, commonly
> downloaded over CDNs, for which most resource authors already want
> stable versioning independent of any security threats.  It's
> serendipitous that this also happens to be use case where
> content-addressable-caching could also provide a large benefit.
> 
> If we can make that work without horrible security and privacy
> side-effects, and people use it and like it and it doesn't make the
> web horribly brittle, then we can take the next baby steps.
> 
> The directions of those baby steps also can be guided by at least
> three major motivations, which we should probably discuss as part of
> our rechartering effort:
> 
> 1) Decrease the performance and other costs associated with delivering
> an all-secure web.  TLS is very cheap, but caching is still a big
> deal, especially for people in remote locations living on very modest
> means.  There are over 5 billion people with no internet connectivity
> at all today, and these costs are meaningful to them.
> 
> 2) Allow specification of applications build with web technologies and
> possibly delivered over the web that are more concrete and verifiable,
> perhaps with the intent of being able to grant more sensitive
> permissions to such applications.  I think that the SysApps group and
> work on app manifests that I pointed to above is important to consider
> for any such efforts, and perhaps we should cultivate more formal
> coordination on this front.
> 
> 3) Reduce single points of failure for security on the web.  This has
> always been my main motivation.  How do we make it so that compromise
> of a single web server providing script libraries, analytics, sign-in,
> social widgets, or the like doesn't automatically transitively
> compromise the web applications of millions of sites that include
> script from those servers?   Again, next-steps here don't necessarily
> entail adding more to SRI, but maybe providing better and less fragile
> privilege separation mechanisms for script.  (maybe better secure
> modularization in JS itself, or maybe pulling two scripts - an
> SRI-tagged interface layer that goes directly in your environment, and
> a implementation that gets forced into something like a cross-origin
> sandboxed worker.)
> 
> -Brad
> 
> On Fri, Oct 24, 2014 at 3:00 AM, Mike West <mkwst@google.com> wrote:
>> The security improvement we get from integrity checks comes from the fact
>> that the digest is delivered out-of-band with the resource. If jQuery's
>> server is compromised, it's only the sloppiest of attackers who would update
>> the resource, but not the headers.
>> 
>> It's not clear to me what benefit we'd obtain from a response header that
>> contained information that could be easily calculated from the resource
>> itself. Could you explain the use-case a little bit?
>> 
>> -mike
>> 
>> --
>> Mike West <mkwst@google.com>
>> Google+: https://mkw.st/+, Twitter: @mikewest, Cell: +49 162 10 255 91
>> 
>> Google Germany GmbH, Dienerstrasse 12, 80331 München, Germany
>> Registergericht und -nummer: Hamburg, HRB 86891
>> Sitz der Gesellschaft: Hamburg
>> Geschäftsführer: Graham Law, Christine Elizabeth Flores
>> (Sorry; I'm legally required to add this exciting detail to emails. Bleh.)
>> 
>> On Fri, Oct 24, 2014 at 5:47 AM, Mark Nottingham <mnot@mnot.net> wrote:
>>> 
>>> Has there been any discussion of how the integrity information is
>>> associated with a resource?
>>> 
>>> I think using the integrity attribute on the link makes sense for the most
>>> current use case -- assuring that off-site content (e.g., on a CDN) is what
>>> you think it's going to be. That's because in these cases, the URL is most
>>> likely to be a version-specific one (<e.g.,
>>> https://cdn.com/foolib.1.2.3.js>), so if the author wants to update the
>>> library version used, they'll need to update the link, and the integrity
>>> information is right next to it.
>>> 
>>> However, in the cache reuse case -- which seems to be getting *some*
>>> traction (or at least consideration) -- next to the link is about the worst
>>> place the integrity information can go; if the author updates the library,
>>> they'll need to update each and every instance of a link to it, which can be
>>> quite onerous.
>>> 
>>> In that use case, it makes more sense to put integrity information into
>>> HTTP headers or even a separate resource, so that it can more easily be
>>> updated (e.g., by a separate process, or automatically by the server at
>>> response time).
>>> 
>>> So, I'm wondering if the WG would consider allowing integrity information
>>> to be carried in HTTP response headers (e.g., Link), at least for the cache
>>> reuse case.
>>> 
>>> Cheers,
>>> 
>>> --
>>> Mark Nottingham   https://www.mnot.net/
>>> 
>>> 
>> 

--
Mark Nottingham   http://www.mnot.net/
Received on Monday, 27 October 2014 20:43:59 UTC