Re: [whatwg] Subresource Integrity-based caching from James Roper on 2017-03-03 (public-whatwg-archive@w3.org from March 2017)

From: James Roper <james@lightbend.com>
Date: Fri, 3 Mar 2017 09:02:05 +0900
To: Roger Hågensen <rh_whatwg@skuldwyrm.no>
Cc: whatwg@lists.whatwg.org
Message-ID: <CABY0rKMXJSu+4F9PB67CsscOzWzuL7+EySHGj8J9LfH9at=YZA@mail.gmail.com>
On 3 Mar. 2017 00:09, "Roger Hågensen" <rh_whatwg@skuldwyrm.no> wrote:

On 2017-03-02 02:59, Alex Jordan wrote:

> Here's the basic problem: say I want to include jQuery in a page. I
> have two options: host it myself, or use a CDN.
>
Not to be overly  pedantic but you might re-evaluate the need for jquery
and other such frameworks. "HTML5" now do pretty much the same as these
older frameworks wit the same or less amount of code.



The fundamental issue is that there isn't a direct correspondence to
> what a resource's _address_ is and what the resource _itself_ is. In
> other words, jQuery 2.0.0 on my domain and jQuery 2.0.0 on the Google
> CDN are the exact same resource in terms of content, but are
> considered different because they have different addresses.
>
Yes and no. The URI is a unique identifier for a resource. If the URI is
different then it is not the same resource. The content may be the same but
the resource is different. You are mixing up resource and content in your
explanation. Address and resource is in this case the same thing.


2. This could potentially be a carrot used to encourage adoption of
> Subresource Integrity, because it confers a significant performance
> benefit.
>
This can be solved by improved webdesign. Serve a static page (and not
forget gzip compression), and then background load the script and extra CSS
etc. By the time the visitor has read/looked/scanned down the page the
scripts are loaded. There is however some bandwidth savings merit in your
suggestion.

...That's okay, though, because the fact that it's based on a hash
> guarantees that the cache
>
> matches what would've been sent over the network - if these were
> different, the hash wouldn't match and the mechanism wouldn't kick in.
>
> ...
>
> Anyway, this email is long enough already but I'd love to hear
> thoughts about things I've missed, etc.
>
How about you miss-understanding the fact that a hash can only ever
guarantee that two resources are different. A hash can not guarantee that
two resources are the same. A hash do infer a high probability they are the
same but can never guarantee it, such is the nature of of a hash. A
carefully tailored jquery.js that matches the hash of the "original
jquery.js" could be crafted and contain a hidden payload. Now the browser
suddenly injects this script into all websites that the user visits that
use that particular version of jquery.js which I'd call a extremely serious
security hole. you can't rely on length either as that could also be padded
to match the length. Not to mention that this is also crossing the CORS
threshold (the first instance is from a different domain than the current
page is for example). Accidental (natural) collision probabilities for
sha256/sha384/sha512 is very low, but intentional ones are higher than
accidental ones.


This is completely wrong. No one has *ever* produced an intentional
collision in sha256 or greater. That's the whole point of cryptographic
hashes, it is impossible to intentionally create a collision, if it were
possible to create a collision, the algorithm would need to be declared
broken and never used again. In case you missed the headlines, last week
Google announced it created a sha1 collision. That is the first, and only
known sha1 collision ever created. This means sha1 is broken, and must not
be used.

Now it's unlikely (as in, it's not likely to happen in the history of a
billion universes), but it is possible that at some point in the history of
sha256 that a collision was accidentally created. This probability is non
zero, which is greater than the impossibility of intentionally creating a
collision, hence it is more likely that we will get an accidental collision
than an intentional collision.


While I haven't checked the browser source codes I would not be surprised
if browsers in certain situations cache a single instance of a script that
is used on multiple pages on a website (different url but the same hash).
This would be within the same domain and usually not a security issue.


It might be better to use UUIDs instead and a trusted "cache", this cache
could be provided by a 3rd party or the Browser developer themselves.

Such a solution would require a uuid="{some-uuid-number}" attribute added
to the script tag.  And if encountered the browser could ignore the script
url and integrity attribute and use either a local cache (from earlier) or
a trusted cache on the net somewhere.

The type of scripts that would benefit from this are the ones that follow a
Major.Minor.Patch version format, and a UUID would apply to the major
version only, so if the major version changed then the script would require
a new UUID.

Only the most popular scripts and major versions of such would be cached,
but those are usually the larger and more important ones anyway. It's your
jquery, bootstrap, angular, modernizer, and so on.

-- 
Roger Hågensen,
Freelancer, Norway.
Received on Friday, 3 March 2017 00:02:39 UTC