- From: Alex Jordan <alex@strugee.net>
- Date: Wed, 1 Mar 2017 20:59:32 -0500
- To: whatwg@whatwg.org
Heya! So recently I've been thinking about caching on the web and think I've come up with a pretty neat trick to improve things. However before I go file a bunch of bugs against browsers I thought it prudent to get feedback from spec folks. Here's the basic problem: say I want to include jQuery in a page. I have two options: host it myself, or use a CDN. If I host it myself, then I don't get caching benefits for first-time visitors because they (obviously) haven't visited my page and requested jQuery from my domain before. Using a sufficiently widespread CDN will fix this for me, because the more widespread the CDN is, the more likely the user is to have encountered a page using that CDN. However, this is somewhat problematic because it leaks data to the CDN operator. The fundamental issue is that there isn't a direct correspondence to what a resource's _address_ is and what the resource _itself_ is. In other words, jQuery 2.0.0 on my domain and jQuery 2.0.0 on the Google CDN are the exact same resource in terms of content, but are considered different because they have different addresses. Here's the proposal: when browsers encounter a <script> tag, etc. with an integrity= attribute, they try to find a resource in their cache that matches the specified hash. If one is found, it is used regardless of the domain it originated from (and thus was cached for). Some notes: 1. This is very similar to existing cache-busting techniques employed by websites today, just baked into the browser. 2. This could potentially be a carrot used to encourage adoption of Subresource Integrity, because it confers a significant performance benefit. 3. This sidesteps existing HTTP caching and will probably ignore/violate some HTTP caching semantics. That's okay, though, because the fact that it's based on a hash guarantees that the cache matches what would've been sent over the network - if these were different, the hash wouldn't match and the mechanism wouldn't kick in. 4. In cases where the integrity= attribute matches some resource in the user's cache, but not what would normally be returned from the server, the request will succeed where it otherwise would have failed. I don't _think_ this is a problem but it *is* technically a possible fingerprinting vector. The risks are similar to those associated with intermediary CA caching, which is already shipping and AFAIK is considered an acceptable risk. Anyway, this email is long enough already but I'd love to hear thoughts about things I've missed, etc. Cheers! AJ
Received on Thursday, 2 March 2017 02:00:09 UTC