RE: [Feature Proposal] New attributes "library" and "version" on script tags from François REMY on 2013-08-12 (public-html@w3.org from August 2013)

From: François REMY <francois.remy.dev@outlook.com>
Date: Sun, 11 Aug 2013 17:27:19 -0700
To: "'Nathanael D. Jones'" <nathanael.jones@gmail.com>
CC: "'Patrick H. Lauke'" <redux@splintered.co.uk>, "'HTML WG LIST'" <public-html@w3.org>, "'Glenn Adams'" <glenn@skynav.com>
Message-ID: <DUB402-EAS79234C3A62FC216DD162EAA55B0@phx.gbl>
> You've evidently misinterpreted my
> proposal. I'm not suggesting any server-side or HTTP-level behavior

I know. This is what I'm proposing. What you propose is basically to put an ETAG in the HTML file, outside of the control of the server. What I propose is to create a sort of "shared etag" so that the browser can send some "etag of a file he already know" to avoid downloading a file that may be the same. In the end, it’s up to the server to decide.

If the reason you think the hash should be included inline is that it avoids an RTT for the hash discovery, I disagree because HTTP/2.0 will allow a website to reply to requests a browser didn't send as part of its initial response, so the server could reply when a client connect with the full HEAD of some script files he expect the client to have already, so that the browser do not have to issue that request if the caching conditions of if the hash matches [*]. 

This system does not require you to modify your website in any way, it's a purely server implementation trick. It also don't force you to modify your static HTML files when you update some library to a new version.



[*] You lose your RTT only if you store your script on another server than yours (because your server cannot provide the HEAD responses with the hashes in lieu of another server, that would be a security issue). However, including scripts from another domain is already a bad practice because you're giving away your security to some other website. The whole point of this proposal is that you may spare download of the resource even if you decide that for security or performance (dns and already open connection) reason you do not want to rely on someone else CDN to hope to avoid the download (now people do because some CDNs are popular so the time not spent downloading the resource because it's shared across websites is higher than the time lost for the few people not having the resource already and having to open a new connection to the said CDN while loading the page).

An example of this issue is people using Google Web Font store to download their fonts because it means that everybody that use Google CDN and the same font only need to download once. My proposal allows you to store the file on your website but still get the benefits if anyone else use the same file but hosted on its own server (or even the Google CDN) because the hash will work alike the etag (your server will reply with a 304 under HTTP1, send the hash as an aside reply under HTTP2).



> whatsoever - the hash only appears once, in the HTML.
> It is never transmitted again. 

I can't help but repeat...

I didn't say SHA-2 was insecure. My proposal use it as well. My issue with this proposal is the fact it's not the role of HTML to deal with transport-layer issues. 

My real point is: no browser should ever load a resource without asking the server that hosts it the authorization to do so, and the metadata (CORS,CSP,...) under which that server operates. I don't say SHA2 is insecure, I say loading a resource only based on a claim (an attribute on the <script> tag) which is potentially sent by someone else than the website which hosts the resource and subject to XSS attacks is a bad idea. Additionally, transmitting the hash of every file as part of the url (or an attribute found in the HTML, or whatever) is a bad idea, too. The identifier of a resource should never include content-based information because there's always a risk for this information to be out-of-sync [EDIT] and partial in the sense it doesn't cover all the http headers the server may want to send [/EDIT].

My high-order belief is that this "super-cache" feature should be built in the HTTP layer and reuse the HTTP caching semantics and should not be defined at the HTML level because it deals with a transport-layer issue.



> Until you can show mathematical evidence otherwise,
> we can safely assume that collisions for SHA-2 (512-bit)
> cannot presently be found through chance or malicious
> intent. There doesn't seem to be much hope for this
> happening anytime soon, either. 

The security issue of your proposal is not the SHA-2 hash (yet) but the fact that you load a resource without asking any server the permission to do so!



> Provide specific attack scenarios for your hand-wavy
> references to XSS, or stop generating FUD. Assuming
> a secure hash function, I can't see how this could
> possibly be useful for XSS.

Someone could poison the cache for a file by mapping it to another very well-known one.

    <script src="http://mybank/secure.js" hash="jquery-hash" />

Or someone may expect that some file does not load if some security restrictions are not met, and those restrictions checked by the server could be bypassed by attributing an hash to the file when it's loaded the first time (the conditions are met) and the reusing that hash the next times when the conditions are not met.

    On the secure website, with the hash changing every day:
    <script src="http://mybank/check-security.php" onload="doSomething()" hash="hash-for-people-having-already-passed-the-test-today" />
    
    The attacker use a computer that passes the tests, download the file, create a malicious image/script/iframe pointing to the file and associating the right hash to it
    <script src="http://attackersite/check-security.php" hash="hash-for-people-having-already-passed-the-test-today" />
    
    Now people visiting the site will have the security check disabled on the bank site because there exists some other site which already filled the super-cache with the right file, not doing any security check. This is clearly a case where the bank site owner did a mistake (he should have used a normal expire header and no hash) but it's not easy for him to understand that.

Or if someone want to tricks a webpage, it could create an image whose src points to one image (scripts will recognize it as an image coming from the right server because the src will be right) but have an hash pointing to some other image the attacker downloaded before (ie the image will contain something that's different from that its src attribute says it does).



Again, I continue to claim that if there's a reference to a script, an image, or whatever hosted on some server, the server MUST give his authorization before a resource is loaded, even if there's some identification process going on.



> It should be noted that the hash of a resource and the resource
> itself should be protected with equal vigor; the hash contains
> 'part of the resource', and can be used to reconstruct the entire
> resource (through a directed attack at shared visitors). 
>
> This may not be obvious at first glance.

I was about to say the exact same thing. It's not obvious, people will make mistakes. An hash is supposedly something you can leak. Examples: we don't store passwords in DB but (salted) hashes so that if someone takes over the DB, it can't recover the passwords. Here you're using hashes to recover the resource, people will definitely not like that.
Received on Monday, 12 August 2013 00:27:56 UTC