Re: "Subresource Integrity" spec up for review. from Joel Weinberger on 2014-01-14 (public-webappsec@w3.org from January 2014)

From: Joel Weinberger <jww@chromium.org>
Date: Tue, 14 Jan 2014 15:00:44 -0800
To: Ryan Sleevi <rsleevi@chromium.org>
Cc: Brad Hill <hillbrad@gmail.com>, Mike West <mkwst@google.com>, "public-webappsec@w3.org" <public-webappsec@w3.org>, Frederik Braun <fbraun@mozilla.com>, Devdatta Akhawe <dev.akhawe@gmail.com>
Message-ID: <CAHQV2K=+a5SUsOd9gzw3FDSsKBMHGuQGgm8PpryTHDDk1Xh4fQ@mail.gmail.com>
On Tue, Jan 14, 2014 at 10:52 AM, Ryan Sleevi <rsleevi@chromium.org> wrote:

>
>
>
> On Tue, Jan 14, 2014 at 10:27 AM, Joel Weinberger <jww@chromium.org>wrote:
>
>>
>>
>>
>> On Tue, Jan 14, 2014 at 7:55 AM, Brad Hill <hillbrad@gmail.com> wrote:
>>
>>>
>>>
>>> On Mon, Jan 13, 2014 at 12:46 PM, Ryan Sleevi <rsleevi@chromium.org>wrote:
>>>>
>>>>  ...
>>>>>
>>>> I think it's a bit of a stretch to suggest that because traffic
>>>> analysis exists as a possibility that HTTPS provides limited-to-no-privacy,
>>>>  ...
>>>>
>>>>
>>> That's a much stronger claim than I'm making. I'm only suggesting that
>>> some resource loads aren't very privacy-sensitive to begin with, and
>>> probably can be observed or inferred anyway over HTTPS, so I think there is
>>> limited or no harm in performing them with only integrity protections.
>>>  Granted, of course, we can't enforce in spec language or code that only
>>> such resources would be used with this technology, but privacy is always a
>>> matter of trusting your counterparty to be responsible.
>>>
>>>
>>>>
>>>>> ...
>>>>>
>>>> As browser vendors, we have an obligation to our users to ensure that
>>>> their security is preserved, and, whenever both possible and reasonable,
>>>> that their *expectations* of security is preserved.
>>>>
>>>> Today, there is a simple duality of the web. Either you're browsing in
>>>> HTTP - in which there is absolutely no security whatsoever - or you're
>>>> browsing with HTTPS, which provides a combination of assertions about
>>>> identity (namely, domain ownership), privacy, and integrity.
>>>>
>>>> If a user visits https://site.example and it loads sub-resources over
>>>> HTTP with integrity protection - which is, at it's core, the crux of #6 -
>>>> what would or should the UI indicate. Is it reasonable to show the famed
>>>> 'lock' icon in this case - even when the traffic is visible to an
>>>> attacker/observer? Does that align with users expectations? I don't think
>>>> it does.
>>>>
>>>
>>> I think this is a very good question indeed.  I appreciate the effort to
>>> make clear security statements to the user, and the lock, while mysterious
>>> in its inner workings, is about all we have right now.  I agree that it is
>>> a bad idea to devalue it, and it could risk trust in the web overall.
>>>
>>> Along these lines, I wonder about the integrity cache idea.  What's the
>>> effective difference between allowing an HTTPS resource with the lock to be
>>> composed from pieces that might've been fetched as part of a different
>>> (secure or not) resource or delivered with an app, versus doing an
>>> immediate fetch-with-integrity over insecure channels?  What are the actual
>>> essential properties we're trying to communicate to the user with the lock,
>>> and what violates them?  Just something to think about and discuss further,
>>> since I like the integrity cache idea even more than I like the
>>> mixed-content with integrity idea.
>>>
>>>
>>>> You can always refer to edge-cache controlled names within your
>>>> resource loading URLs. If, for various reasons (eg: SOP, CORS, etc), then
>>>> you can always delegate a sub-domain, as many organizations are already
>>>> doing.
>>>>
>>>
>>> Good point.
>>>
>>>>
>>>> If your threat model is state level attackers and/or legal compulsion,
>>>> you can *still* use the integrity protected sub-resources - but deliver
>>>> those resources over HTTPS. HTTPS avoids the mixed content, and provides
>>>> real and meaningful integrity protection (eg: without worrying about the
>>>> hash collisions implications of vastly unstructured data like JS), and then
>>>> this use case just fits into the #1/#2.
>>>>
>>>> I still think that the integrity attribute is useful here, even if we
>>> assume HTTPS, because the distributed nature of a CDN puts so many more
>>> entities in a position of privilege, and if you're loading script,
>>> importing HTML or even loading images, it's still your origin at the end of
>>> the day from the user's perspective.
>>>
>> +1 to Brad's point here. From my perspective, this is, in fact, probably
>> the most important part of the integrity spec. Without integrity,
>> regardless of HTTPS or not, many websites are instilling trust in CDNs that
>> is simple unnecessary. It provides many more attack vectors, and there
>> isn't a reason that should be. With the integrity check, it allows the
>> original server to be the authoritative source of content. CDNs are reduced
>> to what they originally were meant to be: content distribution only, with
>> no authority. This seems extraordinarily useful to me, with HTTP or HTTPS.
>>
>>
> Right,
>
> Just in case it was not clear, I'm 100% on board with the integrity spec
> as a way of dealing with untrusted hosters who may serve different content
> than intended. I think the decentralized way of serving resources really
> does need a way of the embedder to be able to specify policy - whether it
> be simple integrity or more complex security policy (as proposed by
> http://www.secure-links.org/ ).
>
> That said, my concerns with the use case is simply that I have trouble
> with a use case positioning it as an alternative/replacement for HTTPS, or
> to allow mixed-content within an HTTPS context, provided that it's
> integrity protected. I think that's a shakier use case, with ramifications
> on UI and user expectations, but also with processing model.
>
Yikes! I certainly hope no one promotes integrity over HTTPS. HTTPS good.
Integrity bad. Perhaps it's worth even putting that into the intro of the
spec?

>
> Example: Consider if sub-resource integrity was handled via MD5 - and
> algorithm with known collisions that can be computed. In the "untrusted
> hoster" scenario, it's the embedder who can and should make the security
> decision about the integrity of links, and the worst that would happen when
> a hash breaks (such as MD5) is that it falls back to "the normal web" of no
> sub-resource integrity. There are no implications to browser processing.
>
> Now consider if sub-resources were allowed to be mixed. First and
> foremost, you'd still need to support HTTPS (for all "downlevel"/non
> subresource integrity aware) browsers. When the chosen hash (eg: MD5) is
> broken, a UA will disable that hash from being acceptable, and then all
> updated users of those browsers find themselves fetching over HTTPS again
> (because the HTTP version is unacceptable). This effectively means that, as
> a site operator, you're always required to handle the capacity of the
> thundering herd hitting your HTTPS deployments, and HTTP is just a "nice to
> have".
>
> However, the reality is that sites will no doubt come to rely on UAs
> preferring the HTTP over HTTPS, and fail to plan capacity of HTTPS. This
> then makes it harder for UAs to disable the hash, because if the sites
> failed to plan capacity, disabling the hash may make the UA appear slower
> for users - because the HTTPS is under-provisioned. It then becomes this
> weird and tricky game of trying to find out what security guarantees you
> can reasonably make about the connection, and how do you communicate that
> to the user? Slowing down sites users go to is very much a "regression" -
> yet allowing insecure content through is equally untenable from a security
> perspective.
>
> Having dealt with deprecating crypto on the SSL and PKI side - including
> MD5 and RSA keys < 1024-bits, I'm painfully aware that it's incredibly
> tricky to balance the usability/performance concerns (eg: 20% of the top
> 10,000 sites break) with the security concerns, and I see the mixed-content
> use case as only amplifying this confusion and hand-wringing from UAs. I
> don't think that some of the tricks we use for SSL/TLS "weirdness" (eg:
> internal server names getting a red URL bar, but no interstitial) will work
> for sub-resource integrity, and I think foisting more security decisions on
> users would be a bad thing. That's why I have trouble seeing how it
> would/could work to the benefit of users (and their security)
>
As I see it, the tradeoff is really pretty straightforward:

   1. Integrity provides better security than straight HTTP and
   mixed-content HTTPS, from a purely cryptographic perspective. It gives
   integrity, but does nothing about privacy.
   2. From a usability perspective, we risk confusing both developers and
   users into thinking that it's better than HTTPS and that's a very dangerous
   and bad thing.

It seems to me that (2) is up to the User Agent to fix. Now, if we all come
to a point where we all agree that no User Agent could *possibly* fix it,
that's a serious issue, and would probably mean we shouldn't allow mixed
content. But that's not obvious to me yet (in fact, we haven't even begun
discussing how User Agents might deal with it). Maybe we should, as Brad
suggested on another point, mark it as volatile (or whatever the proper W3C
term is), and see if we can come up with some ideas on how to get (1) while
minimizing (2).

For the record, I'm not sure that we can deal with (2) yet. If we can't, I
don't think the spec should allow for mixed-content. But I also believe
it's possible there's good ideas on how to solve (2), and I think we should
hash them out.
Received on Tuesday, 14 January 2014 23:01:13 UTC