- From: Roger Hågensen <rh_whatwg@skuldwyrm.no>
- Date: Fri, 3 Mar 2017 14:18:20 +0100
- To: whatwg@lists.whatwg.org
On 2017-03-03 01:02, James Roper wrote: > > How about you miss-understanding the fact that a hash can only > ever guarantee that two resources are different. A hash can not > guarantee that two resources are the same. A hash do infer a high > probability they are the same but can never guarantee it, such is > the nature of of a hash. A carefully tailored jquery.js that > matches the hash of the "original jquery.js" could be crafted and > contain a hidden payload. Now the browser suddenly injects this > script into all websites that the user visits that use that > particular version of jquery.js which I'd call a extremely serious > security hole. you can't rely on length either as that could also > be padded to match the length. Not to mention that this is also > crossing the CORS threshold (the first instance is from a > different domain than the current page is for example). Accidental > (natural) collision probabilities for sha256/sha384/sha512 is very > low, but intentional ones are higher than accidental ones. > > > This is completely wrong. No one has *ever* produced an intentional > collision in sha256 or greater. > Huh? When did I ever state that? I have never said that sha256 or higher having been broken, do not put words/lies in my mouth please. I find that highly offensive. I said "could", just ask any cryptographer. It is highly improbable, but theoretically possible, but fully impractical to attempt (current stages of quantum computing has not shown any magic bullet yet). I'm equally concerned with a natural collision, while the probability is incredibly small the chance is 50/50 (if we imagine all files containing random data and file lengths, which they don't). And as to my statement "a hash can only ever guarantee that two resources are different. A hash can not guarantee that two resources are the same" again that is true. You can even test this by using small enough hashes (CRC-4 or something simple) and editing a file and you'll see that what I say is true. You know how a these types of hashes works right? They are NOT UNIQUE, if if you want something unique then those are called "Perfect Hash" which is not something you want to use for cryptography. If a hash like sha256 was unique it would be a compression miracle as you could then just "uncompress" the hash. Only if the data you hash is the same size as the hash can you perfectly re-create the data that is hashed. Which is what I proposed with my UUID suggestion. Do note that I'm talking about Version 1 UUIDs and not the random Version 4 ones which re not unique. > In case you missed the headlines, last week Google announced it > created a sha1 collision. That is the first, and only known sha1 > collision ever created. This means sha1 is broken, and must not be used. > > Now it's unlikely (as in, it's not likely to happen in the history of > a billion universes), but it is possible that at some point in the > history of sha256 that a collision was accidentally created. This > probability is non zero, which is greater than the impossibility of > intentionally creating a collision, hence it is more likely that we > will get an accidental collision than an intentional collision. > Sha1 still has it's uses. Now I haven't checked but sha1 just as md5 are still ok to use with HMAC. Also it's odd that you say sha1 should not be used at all. Nothing wrong with using it as a file hash/checksum. With the number of files and the increase in-data CRC32 is nit that useful (unless you divide the file in chunks and provide a CRC32 array instead). A hash is not the right way to do what you want, a UUID and a (or multiple) trusted shared cache(s) is. The issue with using a hash is that at some point sha256 could become deprecated, do the browser start ignoring i then? Should it behave as if the javascript file had no hash or that it's potentially dangerous now? Also take note that a UUID can also be made into a valid URI, but I suggested adding a attribute as that would make older browsers/version "forward compatible" as the URI till works normally. And to try and not entirely run you idea into the ground. It's not detailed enough. By that I mean you would need a way for the webdesigner to inform the browser that they do not want the scripts hosted on their site replaced by these from another site. Now requiring a Opt Out is a pain in the ass, and when security is concerned one such never have to "Opt Out to get more secure", one should by default be more secure. Which means that you would need to add another attribute or modify the integrity one to allow cache sharing. Now myself I would never do that, even if the hash matches I'd never feel comfortable running a script originating from some other site in the page I'm delivering to my visitor. I would not actually want the browser to even cache my script and provide that to other sites pages. I might however feel comfortable adding a UUID and let the browser fetch that script from it's local cache or from a trusted cloud cache. If you are going to use the integrity attribute for authentication then you also need to add a method of revocation so that if for example the hashing used is deemed weak/compromised (due to say a so far undiscovered design flaw), then only the browsers that are up to date will be able to consider those hashes unsafe. Older browsers will be clueless and all of a sudden some porn site includes manipulated banking.js and whenever a older browser with a stale cache encounter that it replaces that and the next time the user goes to their bank the browser will happily use a trojan script instead. Te end result is that bank's etc will not use the integrity attribute or they will server a different versioned script for each visit/page load which kinda nukes caching in general. Remember, you did not specify a optin/optout for the shared integrity based caching. You might say that this is all theoretical, but you yourself proclaimed sha1 is no longer safe. Imagine if the most popular version of jquery became a trojan, we're talking tens of thousands of very high profile sites possible victims of cache poisoning. Now I'm not saying the integrity attribute is useless, for CDNs it's pretty nice. It ensures that when your site uses say awesomescript12.js that is awesomescript12.js and not a misnamed awesomescript10.js or worse notsoawesomescript4.js But, at this point you already trust the CDN (why else would you use them right?) Another thing the integrity hash is great for is to reduce the chance of a damaged script being loaded (sha512 has way more bits than CRC32 for example). And if I was to let a webpage fetch a script from a CDN I would probably use the integrity attribute, but that is because I trust that CDN. If a browser just caches the first of whatever it encounter and then use that for all subsequent requests for that script then I want no part of that, it's a security boundary I'm not willing to cross, hash or no hash. So a opt-in would be essential on this. Now many sites have there own CDN I assume these are your focus. But many use global ones (sometimes provided directly/indirectly with the blessing of the developers of a script). I don't see this a a major caching issue. The main issue is multiple versions of a script. Many scripts are not always that backward compatible, I have seen cases where there are 3-4 versions of the same script on the same site. A shared browser cache may help with that if those are the unedited official scripts of jquery but usually they may not be. They may also be run through a minifer or similar or they have been minified but not with the same settings as the official one. This is why I stress that a UUID based idea is better in the whole. As the focus would be on the versions/APIs/interoperability instead. I.e. v1.1 and v1.2 have the exact same calls just some bug fixes? They can both be given the same UUID and the CDN or trusted cache will provide v1.2 all the time. PS! Not trying to sound like an ass here but could you trim the email next time? While I do enjoy hearing my own voice/reading my own text as much as the next person there is no need to quote the whole thing. Also why did you CC me a full quote of my email but did not write anything yourself, did you hit reply by accident or is there a bug in the email system somewhere? Which brings me to a nitpick of mine, if you reply to the list then there is no need to also CC me. If' I'm posting to the list then I'm also reading the list, I'd rather not have multiple email copies in my inbox. Hit the "Reply to list" button instead of "Reply to all" next time (these options depends on your email client). -- Roger Hågensen, Freelancer, Norway.
Received on Friday, 3 March 2017 13:19:15 UTC