Re: [whatwg] Subresource Integrity-based caching from Roger Hågensen on 2017-03-03 (public-whatwg-archive@w3.org from March 2017)

From: Roger Hågensen <rh_whatwg@skuldwyrm.no>
Date: Fri, 3 Mar 2017 14:18:20 +0100
To: whatwg@lists.whatwg.org
Message-ID: <381ce622-b801-b60d-4817-24afec100756@skuldwyrm.no>
On 2017-03-03 01:02, James Roper wrote:
>
>     How about you miss-understanding the fact that a hash can only
>     ever guarantee that two resources are different. A hash can not
>     guarantee that two resources are the same. A hash do infer a high
>     probability they are the same but can never guarantee it, such is
>     the nature of of a hash. A carefully tailored jquery.js that
>     matches the hash of the "original jquery.js" could be crafted and
>     contain a hidden payload. Now the browser suddenly injects this
>     script into all websites that the user visits that use that
>     particular version of jquery.js which I'd call a extremely serious
>     security hole. you can't rely on length either as that could also
>     be padded to match the length. Not to mention that this is also
>     crossing the CORS threshold (the first instance is from a
>     different domain than the current page is for example). Accidental
>     (natural) collision probabilities for sha256/sha384/sha512 is very
>     low, but intentional ones are higher than accidental ones.
>
>
> This is completely wrong. No one has *ever* produced an intentional 
> collision in sha256 or greater.
>
Huh? When did I ever state that? I have never said that sha256 or higher 
having been broken, do not put words/lies in my mouth please. I find 
that highly offensive.
I said "could", just ask any cryptographer. It is highly improbable, but 
theoretically possible, but fully impractical to attempt (current stages 
of quantum computing has not shown any magic bullet yet).

I'm equally concerned with a natural collision, while the probability is 
incredibly small the chance is 50/50 (if we imagine all files containing 
random data and file lengths, which they don't).

And as to my statement "a hash can only ever guarantee that two 
resources are different. A hash can not guarantee that two resources are 
the same" again that is true. You can even test this by using small 
enough hashes (CRC-4 or something simple) and editing a file and you'll 
see that what I say is true.

You know how a these types of hashes works right? They are NOT UNIQUE, 
if if you want something unique then those are called "Perfect Hash" 
which is not something you want to use for cryptography.
If a hash like sha256 was unique it would be a compression miracle as 
you could then just "uncompress" the hash.

Only if the data you hash is the same size as the hash can you perfectly 
re-create the data that is hashed. Which is what I proposed with my UUID 
suggestion.
Do note that I'm talking about Version 1 UUIDs and not the random 
Version 4 ones which re not unique.

> In case you missed the headlines, last week Google announced it 
> created a sha1 collision. That is the first, and only known sha1 
> collision ever created. This means sha1 is broken, and must not be used.
>
> Now it's unlikely (as in, it's not likely to happen in the history of 
> a billion universes), but it is possible that at some point in the 
> history of sha256 that a collision was accidentally created. This 
> probability is non zero, which is greater than the impossibility of 
> intentionally creating a collision, hence it is more likely that we 
> will get an accidental collision than an intentional collision.
>
Sha1 still has it's uses. Now I haven't checked but sha1 just as md5 are 
still ok to use with HMAC. Also it's odd that you say sha1 should not be 
used at all. Nothing wrong with using it as a file hash/checksum. With 
the number of files and the increase in-data CRC32 is nit that useful 
(unless you divide the file in chunks and provide a CRC32 array instead).

A hash is not the right way to do what you want, a UUID and a (or 
multiple) trusted shared cache(s) is.
The issue with using a hash is that at some point sha256 could become 
deprecated, do the browser start ignoring i then? Should it behave as if 
the javascript file had no hash or that it's potentially dangerous now?

Also take note that a UUID can also be made into a valid URI, but I 
suggested adding a attribute as that would make older browsers/version 
"forward compatible" as the URI till works normally.


And to try and not entirely run you idea into the ground. It's not 
detailed enough. By that I mean you would need a way for the webdesigner 
to inform the browser that they do not want the scripts hosted on their 
site replaced by these from another site. Now requiring a Opt Out is a 
pain in the ass, and when security is concerned one such never have to 
"Opt Out to get more secure", one should by default be more secure.
Which means that you would need to add another attribute or modify the 
integrity one to allow cache sharing.

Now myself I would never do that, even if the hash matches I'd never 
feel comfortable running a script originating from some other site in 
the page I'm delivering to my visitor.
I would not actually want the browser to even cache my script and 
provide that to other sites pages.

I might however feel comfortable adding a UUID and let the browser fetch 
that script from it's local cache or from a trusted cloud cache.

If you are going to use the integrity attribute for authentication then 
you also need to add a method of revocation so that if for example the 
hashing used is deemed weak/compromised (due to say a so far 
undiscovered design flaw), then only the browsers that are up to date 
will be able to consider those hashes unsafe. Older browsers will be 
clueless and all of a sudden some porn site includes manipulated 
banking.js and whenever a older browser with a stale cache encounter 
that it replaces that and the next time the user goes to their bank the 
browser will happily use a trojan script instead. Te end result is that 
bank's etc will not use the integrity attribute or they will server a 
different versioned script for each visit/page load which kinda nukes 
caching in general.
Remember, you did not specify a optin/optout for the shared integrity 
based caching.

You might say that this is all theoretical, but you yourself proclaimed 
sha1 is no longer safe. Imagine if the most popular version of jquery 
became a trojan, we're talking tens of thousands of very high profile 
sites possible victims of cache poisoning.


Now I'm not saying the integrity attribute is useless, for CDNs it's 
pretty nice. It ensures that when your site uses say awesomescript12.js 
that is awesomescript12.js and not a misnamed awesomescript10.js or 
worse notsoawesomescript4.js
But, at this point you already trust the CDN (why else would you use 
them right?)
Another thing the integrity hash is great for is to reduce the chance of 
a damaged script being loaded (sha512 has way more bits than CRC32 for 
example).
And if I was to let a webpage fetch a script from a CDN I would probably 
use the integrity attribute, but that is because I trust that CDN.
If a browser just caches the first of whatever it encounter and then use 
that for all subsequent requests for that script then I want no part of 
that, it's a security boundary I'm not willing to cross, hash or no 
hash. So a opt-in would be essential on this.

Now many sites have there own CDN I assume these are your focus. But 
many use global ones (sometimes provided directly/indirectly with the 
blessing of the developers of a script). I don't see this a a major 
caching issue. The main issue is multiple versions of a script. Many 
scripts are not always that backward compatible, I have seen cases where 
there are 3-4 versions of the same script on the same site. A shared 
browser cache may help with that if those are the unedited official 
scripts of jquery but usually they may not be. They may also be run 
through a minifer or similar or they have been minified but not with the 
same settings as the official one.

This is why I stress that a UUID based idea is better in the whole. As 
the focus would be on the versions/APIs/interoperability instead. I.e. 
v1.1 and v1.2 have the exact same calls just some bug fixes? They can 
both be given the same UUID and the CDN or trusted cache will provide 
v1.2 all the time.



PS! Not trying to sound like an ass here but could you trim the email 
next time? While I do enjoy hearing my own voice/reading my own text as 
much as the next person there is no need to quote the whole thing. Also 
why did you CC me a full quote of my email but did not write anything 
yourself, did you hit reply by accident or is there a bug in the email 
system somewhere?
Which brings me to a nitpick of mine, if you reply to the list then 
there is no need to also CC me. If' I'm posting to the list then I'm 
also reading the list, I'd rather not have multiple email copies in my 
inbox. Hit the "Reply to list" button instead of "Reply to all" next 
time (these options depends on your email client).


-- 
Roger Hågensen,
Freelancer, Norway.
Received on Friday, 3 March 2017 13:19:15 UTC