- From: Chris Bojarski <chris@cbojar.net>
- Date: Sat, 10 Aug 2013 21:26:35 -0400
- To: public-html@w3.org
I will try to address some of these objections/alternatives one by one. First, there really cannot be a system where a file can be downloaded from one arbitrary site and be trusted to be used on any other site. This is basic cache poisoning. To really trust something, it (or a verification of it) would need to come from a trusted source. I understand that, if we could do this, it might seem to present some benefits, but the weight of the downsides heavily outweigh those. Additionally, some benefits are merely illusory. For example, François' major objection is that the browsers would be picking winners and losers with my proposed system, but in truth, a high-priority cross-site cache would enable the same effect. Because the most popular libraries _are_ popular, they would be encountered the most often anyway, and get the same speed-up effect after the first download as under my proposal. The less popular libraries would be just as penalized under both systems since they would be encountered less frequently. For popular libraries, the first site would still be penalized with the initial download, and you would introduce the significant security concerns on top of that. Conversely, winners and losers could still be chosen by popular home pages (ex. google.com) by loading those scripts in the background as soon as the browser opens that home page, but this would occur less transparently, and could be more easily perverted toward a particular commercial or otherwise non-neutral bias. (example: google.com could silently load all the scripts for Google and Gmail and GDrive, making their sites look much faster, while Yahoo! Mail would not get this benefit just because it's not the home page.) Second, Nathaneal's suggestion is interesting, but includes its own challenges. By declaring any cryptographic encoding, one places an immediate expiration date on that standard. Today's uncrackable codes are tomorrow's ROT13's, so even with what is today a cryptographically strong algorithm, there would still be enormous security concerns. We'd also have to define "decompressed, encoding-normalized file contents." I'll start: tabs or spaces? 4 spaces or 2? CRLF, CR, or LF? I'm not saying eventually coming up with such a standard would be impossible, but it would be a significant process full of holy wars that really don't need to be fought. Then, browsers would have to implement a system for normalizing downloaded files to hash them, introducing more, and more cumbersome code and overhead into the system, also possibly bringing with it new vectors of attack on the browsers. (this is where the input of a browser vendor would be helpful to confirm or refute this) Then, library developers would have to understand how to create a "decompressed, encoding-normalized" version of their files. While large library authors might have a grasp of how to do this, smaller library authors might find it too complicated or cumbersome. Then, they would have to communicate all this to web devs, telling them that to (maybe) speed up their pages, they have to include an incomprehensible chunk of code gibberish. And of course the "maybe" comes from the fact that web devs have to assume their page is not the first to be encountered with the script, otherwise there is no benefit to them because it would still have to be downloaded just as before. This gets back to the other objection about favoritism of popular libraries. I see no reason to fill my pages with pseudo-random base-64 strings on the hope that someone else takes the download hit more than me. There was also talk of ensuring universal file names and pushing some aspects of this into HTTP. In regards to the HTTP part, if that were to be done, it would offer almost no benefit over standard caching, but bring with it the normalizing and hashing overhead mentioned above, so the point would be moot. It would also involve whole other standards changes that would become complicated, and would require changes to HTTP servers, something system administrators would be loathe to do. As to universal file names, I've seen: jquery.js, jquery.min.js, jquery-1.8.1.js jquery-1.8.1.min.js, and the ever popular script.js all used to refer to the same file. People don't like that kind of constriction; if they did, we could just use the file names already to accomplish this. I appreciate all the feedback from you guys, and I'm glad that you're looking at this from all sorts of angles I hadn't considered. I see two major concerns emerging: security (which I expected, though being taken in directions I didn't expect), and favoritism/neutrality (which I also expected, and want to get more feedback about since this is a particularly hard wall). Security will be an ongoing concern any time anything is shared between two objects, and favoritism can be hard to negate, as sometimes fighting it can result in more/different favoritism, or other unintended consequences. Transparency, of course, can help address both issues, but we need to make sure we build a good house on a good foundation. And the only way we get there is to have smarter people than I keep talking. :) -Chris. On 08/10/2013 07:46 PM, François REMY wrote: >> A 512-bit SHA-2 hash cannot have conflicts. > Wait, what? This is absurd. Every hash system has conflicts, by definition. You can even calculate how many conflicts there exist: you simply divide the number of possible files of a certain length by the number of possible hashes. Let's suppose all JS files are exactly 64 kilobytes long, there are exactly 1000 files that share the same hash. Arguably, the probability that those files are javascript files is very low (most of them will be garbage) but you cannot base your system on something that's only based on an hash, it doesn't make sense. > > The point of the hash is not the identify the resource, but to make sure no attacker could actually poison the cache by sending a fake file with the same name as a popular library to execute code on other websites (hash are secure in the sense it's hard to find another of the 999 other files having the same hash, let alone creating another file that actually is valid javascript and has the same length and hash, this is generally impossible). > > > >> Filename matching would make the feature unreliable, adding another point of failure. > I don't see how it makes the feature unreliable. DLLs are loaded based on filename on all OSes and I don't think it has ever been an issue... By the way, .NET use exactly the combination of filename+guid to identify DLLs and versions. > > > >> There's also no point this feature if you move it to the HTTP layer; it only provides a benefit if (a) eliminates network traffic and (b) securely identifies a file across multiple websites so (c) browsers can implement higher-order caching. > Not true. Moving the feature to HTTP does not remove (b) and (c) benefits, and conserve most of the (a) benefits. It also add the possibility to accept multiple versions of a same file on the server side, and to update this in real time as new versions get supported, without modifying all the pages. > > > >> Think about how *trivial* it is to implement a text editor plugin that updates these hashes. > Not an argument.
Received on Sunday, 11 August 2013 01:27:03 UTC