Re: [whatwg] Subresource Integrity-based caching from Roger Hågensen on 2017-03-02 (public-whatwg-archive@w3.org from March 2017)

From: Roger Hågensen <rh_whatwg@skuldwyrm.no>
Date: Thu, 2 Mar 2017 16:09:22 +0100
To: whatwg@lists.whatwg.org
Message-ID: <fd24f56f-a15c-b449-973b-98ec293e442e@skuldwyrm.no>
On 2017-03-02 02:59, Alex Jordan wrote:
> Here's the basic problem: say I want to include jQuery in a page. I
> have two options: host it myself, or use a CDN.
Not to be overly  pedantic but you might re-evaluate the need for jquery 
and other such frameworks. "HTML5" now do pretty much the same as these 
older frameworks wit the same or less amount of code.


> The fundamental issue is that there isn't a direct correspondence to
> what a resource's _address_ is and what the resource _itself_ is. In
> other words, jQuery 2.0.0 on my domain and jQuery 2.0.0 on the Google
> CDN are the exact same resource in terms of content, but are
> considered different because they have different addresses.
Yes and no. The URI is a unique identifier for a resource. If the URI is 
different then it is not the same resource. The content may be the same 
but the resource is different. You are mixing up resource and content in 
your explanation. Address and resource is in this case the same thing.

> 2. This could potentially be a carrot used to encourage adoption of
> Subresource Integrity, because it confers a significant performance
> benefit.
This can be solved by improved webdesign. Serve a static page (and not 
forget gzip compression), and then background load the script and extra 
CSS etc. By the time the visitor has read/looked/scanned down the page 
the scripts are loaded. There is however some bandwidth savings merit in 
your suggestion.

> ...That's okay, though, because the fact that it's based on a hash guarantees that the cache
> matches what would've been sent over the network - if these were
> different, the hash wouldn't match and the mechanism wouldn't kick in.
>
> ...
> Anyway, this email is long enough already but I'd love to hear
> thoughts about things I've missed, etc.
How about you miss-understanding the fact that a hash can only ever 
guarantee that two resources are different. A hash can not guarantee 
that two resources are the same. A hash do infer a high probability they 
are the same but can never guarantee it, such is the nature of of a 
hash. A carefully tailored jquery.js that matches the hash of the 
"original jquery.js" could be crafted and contain a hidden payload. Now 
the browser suddenly injects this script into all websites that the user 
visits that use that particular version of jquery.js which I'd call a 
extremely serious security hole. you can't rely on length either as that 
could also be padded to match the length. Not to mention that this is 
also crossing the CORS threshold (the first instance is from a different 
domain than the current page is for example). Accidental (natural) 
collision probabilities for sha256/sha384/sha512 is very low, but 
intentional ones are higher than accidental ones.

While I haven't checked the browser source codes I would not be 
surprised if browsers in certain situations cache a single instance of a 
script that is used on multiple pages on a website (different url but 
the same hash). This would be within the same domain and usually not a 
security issue.


It might be better to use UUIDs instead and a trusted "cache", this 
cache could be provided by a 3rd party or the Browser developer themselves.

Such a solution would require a uuid="{some-uuid-number}" attribute 
added to the script tag.  And if encountered the browser could ignore 
the script url and integrity attribute and use either a local cache 
(from earlier) or a trusted cache on the net somewhere.

The type of scripts that would benefit from this are the ones that 
follow a Major.Minor.Patch version format, and a UUID would apply to the 
major version only, so if the major version changed then the script 
would require a new UUID.

Only the most popular scripts and major versions of such would be 
cached, but those are usually the larger and more important ones anyway. 
It's your jquery, bootstrap, angular, modernizer, and so on.

-- 
Roger Hågensen,
Freelancer, Norway.
Received on Thursday, 2 March 2017 15:09:56 UTC