Re: "Subresource Integrity" spec up for review. from Ryan Sleevi on 2014-01-13 (public-webappsec@w3.org from January 2014)

From: Ryan Sleevi <rsleevi@chromium.org>
Date: Mon, 13 Jan 2014 12:46:05 -0800
To: Brad Hill <hillbrad@gmail.com>
Cc: Mike West <mkwst@google.com>, Ryan Sleevi <rsleevi@chromium.org>, "public-webappsec@w3.org" <public-webappsec@w3.org>, Joel Weinberger <jww@chromium.org>, Frederik Braun <fbraun@mozilla.com>, Devdatta Akhawe <dev.akhawe@gmail.com>
Message-ID: <CACvaWvY67BxPyq+VaQFnuCNBY4G9_u1nUW5zb4K+5ROGUAJUFQ@mail.gmail.com>
On Sat, Jan 11, 2014 at 6:00 PM, Brad Hill <hillbrad@gmail.com> wrote:

> First, I'd also like to thank Ryan for his review and comments.
>
> Here's a shot at defending the "mixed-content with integrity" use-case.
> (#6 in Ryan's objections)
>
> First, it's just not true to say that there are no meaningful performance
> issues with HTTPS.  Caching is a pretty huge thing in terms of performance
> and efficiency.  HTTPS is getting cheaper all the time, but it's still a
> really big deal vs. HTTP when you're trying to reach a global audience.
>  RTTs and latency matter.  And even if the hardware cost of encryption is
> cheap, the cost of building out low-latency endpoints all over the globe
> that are physically secure and contractually/legally trustworthy enough to
> keep a publicly-trusted certificate with your name on it aren't.
>
> A couple of theses on the tradeoffs involved:
>
> A) HTTPS does not provide any sort of by-design protections against
> traffic analysis.  A user navigating around a website with a structure
> known to an adversary is actually pretty distinguishable in their
> activities simply from things like IP address, navigation timing, resource
> sizes, etc.   (e.g.
> http://blog.ioactive.com/2012/02/ssl-traffic-analysis-on-google-maps.html)
>  Some authors will go out of their way to try to make this harder, like
> Twitter's padding the size of their avatar images, but this is a goal that
> requires very special attention and such authors would surely know better
> than to use this proposed mixed-content integrity mechanism.  I think
> there's a fair case to be made that for loading things that aren't actually
> private (the Google doodle of the day, the standard pictures on the PayPal
> home page) over HTTP with integrity is not significantly worse than the
> status quo.  (yes, HTTP/2 makes traffic analysis harder to some degree, but
> that is still a darn long way from a formal guarantee of
> indistinguishability of known plaintext)
>

I think it's a bit of a stretch to suggest that because traffic analysis
exists as a possibility that HTTPS provides limited-to-no-privacy, which is
the only reason I can see why you'd bring up the traffic analysis case in
the context of integrity protection/privacy. It's also, as you recognize, a
problem that sites can themselves alter behaviour to deal with (as attacks
like BREACH have shown)


>
> B) I think many content authors would say that their explicit goal in
> loading resources like images, JS libraries, etc. over HTTPS is integrity
> and not privacy.  Some of us on this list may have our own opinions and
> goals about trying to raise the cost of pervasive surveillance, but that
> isn't everyone's goal for their particular application, or at least not
> their most important goal. I think we, as web standards authors, should be
> careful in how much we try to dictate these goals to our customers instead
> of letting them choose the things they want and need for themselves.
>

Yet, as browser vendors, we have an obligation to our users to ensure that
their security is preserved, and, whenever both possible and reasonable,
that their *expectations* of security is preserved.

Today, there is a simple duality of the web. Either you're browsing in HTTP
- in which there is absolutely no security whatsoever - or you're browsing
with HTTPS, which provides a combination of assertions about identity
(namely, domain ownership), privacy, and integrity.

If a user visits https://site.example and it loads sub-resources over HTTP
with integrity protection - which is, at it's core, the crux of #6 - what
would or should the UI indicate. Is it reasonable to show the famed 'lock'
icon in this case - even when the traffic is visible to an
attacker/observer? Does that align with users expectations? I don't think
it does.


>
> C) Yes, distributed edge-caching over HTTPS is a real thing today, but
> that usually involves delegating to some third party the right to
> impersonate you. (e.g. put your name as a Subject Alt Name on a certificate
> they control)  If we are genuinely worried about state-level attacks
> against various parts of end-to-end web security, these third parties look
> like a very attractive target for compulsion (especially as Certificate
> Transparency gets going).  If a site can keep closer control over its
> public authenticity credentials, in fewer jurisdictions and on many fewer
> servers, by using caching services in a "trust-but-verify" manner, instead
> of today's much more expansive grant of trust, perhaps we have achieved a
> substantial improvement after all.
>
> -Brad
>

I'm not sure this is a fair representation of what certificates assert, nor
does it necessarily involve delegating the right to a third-party to
impersonate you. You can always refer to edge-cache controlled names within
your resource loading URLs. If, for various reasons (eg: SOP, CORS, etc),
then you can always delegate a sub-domain, as many organizations are
already doing.

For example, https://joes-awesome-site.example does NOT need to cede
control of the 'identity' of *.joes-awesome-site.example to a CDN. They can
simply delegate cdn.joes-awesome-site.example to the CDN.

If your threat model is state level attackers and/or legal compulsion, you
can *still* use the integrity protected sub-resources - but deliver those
resources over HTTPS. HTTPS avoids the mixed content, and provides real and
meaningful integrity protection (eg: without worrying about the hash
collisions implications of vastly unstructured data like JS), and then this
use case just fits into the #1/#2.


>
>
>
> On Sat, Jan 11, 2014 at 4:19 AM, Mike West <mkwst@google.com> wrote:
>
>> -security-dev to BCC
>> +public-webappsec, Brad Hill
>>
>> I'm moving this thread to public-webappsec so that folks there can
>> comment directly.
>>
>> On Sat, Jan 11, 2014 at 5:03 AM, Ryan Sleevi <rsleevi@chromium.org>wrote:
>>>
>>> On Wed, Jan 8, 2014 at 3:29 AM, Mike West <mkwst@chromium.org> wrote:
>>>
>>>> Hello, lovely friends of Chromium security!
>>>>
>>>> Frederik, Devdatta, Joel, and I have been working with folks in the
>>>> webappsec WG to put together a specification of the ages-old idea of
>>>> jamming hashes into an HTML page in order to verify the integrity of
>>>> resources that page requests. A strawman draft is up at
>>>> http://w3c.github.io/webappsec/specs/subresourceintegrity/ for review.
>>>>
>>>> Given that some of the proposals are interesting from a security
>>>> perspective (in particular, using hashes as cache identifiers, and
>>>> potentially relaxing mixed-content checks if the hashes are delivered over
>>>> HTTPS), it'd be brilliant to get early feedback so we can make sure the
>>>> spec is sane.
>>>>
>>>
>>> I... have a hard time with this proposal and its use cases beyond the
>>> first and third.
>>>
>>
>> I think you'll find that #2 is really #1 in disguise.
>>
>>
>>> I apologize that I don't have the bandwidth to jump into the fray and
>>> really engage in the W3C group right now, but I can hope you can convey the
>>> message.
>>>
>>
>> No worries. I appreciate the feedback, and we'll pick up the conversation
>> on the W3C list. Please don't feel obligated to keep up with this thread;
>> feel free to mute it.
>>
>>
>>> +1 to 1) Site wants to ensure third-party code doesn't change from what
>>> they reviewed. Cool
>>>
>>
>> Good! This is more or less the core of what I want to achieve. Everything
>> else is nice to have.
>>
>>
>>> +wtf to 2) Site wants to ensure... code review? How is that an HTML
>>> problem? How is it reasonable to induce the CPU costs on millions of users
>>> to enforce what is ultimately a procedural problem at the company? How is
>>> it in the interest of the users?
>>>
>>
>> The use-case was written unclearly. I've rewritten it in the hopes of
>> making it something you'd agree with.
>>
>> In short: an advertising network like Doubleclick delegates the actual
>> delivery of advertising content to third-party servers, and relies on
>> contractual obligations (and probably automated checks, etc) to ensure that
>> the advertisement delivered is the advertisement that was reviewed. Those
>> third-parties sometimes accidentally (or maliciously) deliver altered
>> content. By adding integrity metadata to the iframe that wraps an ad, and
>> by requiring the ad HTML to contain integrity metadata for subresources, ad
>> networks can mitigate this risk.
>>
>>
>>> +0 to 3) I'd say this is where you use HTTPS, especially in light of
>>> discussions to 'downgrade' HTTP downloads.
>>>
>>
>> 1. HTTPS gives different integrity promises: it verifies that the server
>> you're talking to is the one you're expecting, and gives some protection
>> against MITM alterations.
>>
>> 2. For the same reason that use-case #1 is valuable, even over HTTPS,
>> validating download integrity is valuable.
>>
>>
>>> +? to 4) "altered Javascript from the filesystem" is certainly an
>>> unrealistic threat that a UA cannot and should not pretend it can defend
>>> against, unless that UA is running itself in a higher privilege than the
>>> files its accessing - in which case, it should be storing those files
>>> securely.
>>>
>>
>> Freddy's original proposal was meant to cover browser UI that loads
>> script off the net directly. This would cover, for example, Chrome's NTP.
>>
>> I've removed the "filesystem" language, as I agree with your criticism.
>>
>>
>>>  +wtfbbqomg to 5) We have had this amazing way of expressing versions
>>> for resources since the introduction of HTTP. It's called the URL. If the
>>> author wants to express a dependency on a particular version, the amazing
>>> power of the web allows them to put a version within a URL and depend on
>>> that.
>>>
>>
>> This is probably also poorly worded: I believe the intent was another
>> spin on #1. If I load a resource from a server, I'd like to ensure that it
>> hasn't been swapped out behind my back. If it has been swapped out, the
>> reporting functionality will alert me to the fact, I'll go review the new
>> code, and either update the integrity metadata, or rework the mashup to use
>> some other resource if I don't like the changes.
>>
>>
>>> +awwhellnaw to 6) "performance reasons" is not and has not been a
>>> realistic problem for properly-configured SSL for some time. The example
>>> already establishes that the user supports some degree of
>>> properly-configured SSL at rockin-resources.com, so there's no reason
>>> *not* to use it.
>>>
>>
>> I'm still hopeful that Brad (Hi Brad!) will take some time to give more
>> detail around the value proposal for mixed content relaxation and the
>> fallback mechanism. I think several of the editors share your opinion here.
>>
>>
>>> The only 'performance reasons' I'm aware of are those ever-insidious
>>> transparent, caching proxies. Yes, they're ubiquitous. But if you're trying
>>> to solve that problem, you need to come out and say it. "An author wishes
>>> to load a resource that a person in a privileged position on the network
>>> would prefer to intercept and redirect."
>>>
>>
>> I'll work that in somewhere. I do think creating transparency into that
>> manipulation is part of the intent, in much the same way that CSP showed
>> folks like Twitter how much code was being injected into their HTML pages,
>> integrity verification can make it clear how many resources are changed
>> in-flight.
>>
>>
>>> Especially in the nature of our Post-Snowden World, integrity without
>>> privacy seems to be setting the bar too low.
>>>
>>
>> The current suggestion is that any HTTP->HTTPS fallback system would omit
>> credentials from requests to the HTTP server. I agree that that still opens
>> some windows into your browsing activity that might be better left closed.
>>
>>
>>> And integrity with privacy is easily obtained - with HTTPS.
>>>
>>
>> I don't think that's the case, at least, not in the sense of verifying
>> that the resource you're loading hasn't been altered on the server. HTTPS
>> mitigates the risk of middle-men inbetween you and the server you're
>> talking to. It does nothing to verify that the server itself hasn't been
>> compromised.
>>
>> Thanks again for spending some time on this, Ryan. I appreciate it.
>>
>> -mike
>>
>
>
Received on Tuesday, 14 January 2014 07:09:19 UTC