Re: [Feature Proposal] New attributes "library" and "version" on script tags from Nathanael D. Jones on 2013-08-12 (public-html@w3.org from August 2013)

From: Nathanael D. Jones <nathanael.jones@gmail.com>
Date: Sun, 11 Aug 2013 21:58:55 -0400
To: François REMY <francois.remy.dev@outlook.com>
Cc: "Patrick H. Lauke" <redux@splintered.co.uk>, HTML WG LIST <public-html@w3.org>, Glenn Adams <glenn@skynav.com>
Message-ID: <CAG3DbfX18d5DxXUkqQ2Orubu+EBMbSbbMsRK9hhsobtVJyADuA@mail.gmail.com>
I guess I wasn't clear enough.

The browser NEVER uses or trusts the provided hash for WRITING to the
cache; only for READING from the cache.

A resource is written to the cache using the hash *locally calculated by
the BROWSER *after downloading the resource from the URI.

This eliminates the possibility of cache poisoning and all your points
predicated upon that.

On Sun, Aug 11, 2013 at 8:27 PM, François REMY <
francois.remy.dev@outlook.com> wrote:

> > You've evidently misinterpreted my
> > proposal. I'm not suggesting any server-side or HTTP-level behavior
>
> I know. This is what I'm proposing. What you propose is basically to put
> an ETAG in the HTML file, outside of the control of the server. What I
> propose is to create a sort of "shared etag" so that the browser can send
> some "etag of a file he already know" to avoid downloading a file that may
> be the same. In the end, it’s up to the server to decide.
>
> If the reason you think the hash should be included inline is that it
> avoids an RTT for the hash discovery, I disagree because HTTP/2.0 will
> allow a website to reply to requests a browser didn't send as part of its
> initial response, so the server could reply when a client connect with the
> full HEAD of some script files he expect the client to have already, so
> that the browser do not have to issue that request if the caching
> conditions of if the hash matches [*].
>
> This system does not require you to modify your website in any way, it's a
> purely server implementation trick. It also don't force you to modify your
> static HTML files when you update some library to a new version.
>
>
>
> [*] You lose your RTT only if you store your script on another server than
> yours (because your server cannot provide the HEAD responses with the
> hashes in lieu of another server, that would be a security issue). However,
> including scripts from another domain is already a bad practice because
> you're giving away your security to some other website. The whole point of
> this proposal is that you may spare download of the resource even if you
> decide that for security or performance (dns and already open connection)
> reason you do not want to rely on someone else CDN to hope to avoid the
> download (now people do because some CDNs are popular so the time not spent
> downloading the resource because it's shared across websites is higher than
> the time lost for the few people not having the resource already and having
> to open a new connection to the said CDN while loading the page).
>
> An example of this issue is people using Google Web Font store to download
> their fonts because it means that everybody that use Google CDN and the
> same font only need to download once. My proposal allows you to store the
> file on your website but still get the benefits if anyone else use the same
> file but hosted on its own server (or even the Google CDN) because the hash
> will work alike the etag (your server will reply with a 304 under HTTP1,
> send the hash as an aside reply under HTTP2).
>
>
>
> > whatsoever - the hash only appears once, in the HTML.
> > It is never transmitted again.
>
> I can't help but repeat...
>
> I didn't say SHA-2 was insecure. My proposal use it as well. My issue with
> this proposal is the fact it's not the role of HTML to deal with
> transport-layer issues.
>
> My real point is: no browser should ever load a resource without asking
> the server that hosts it the authorization to do so, and the metadata
> (CORS,CSP,...) under which that server operates. I don't say SHA2 is
> insecure, I say loading a resource only based on a claim (an attribute on
> the <script> tag) which is potentially sent by someone else than the
> website which hosts the resource and subject to XSS attacks is a bad idea.
> Additionally, transmitting the hash of every file as part of the url (or an
> attribute found in the HTML, or whatever) is a bad idea, too. The
> identifier of a resource should never include content-based information
> because there's always a risk for this information to be out-of-sync [EDIT]
> and partial in the sense it doesn't cover all the http headers the server
> may want to send [/EDIT].
>
> My high-order belief is that this "super-cache" feature should be built in
> the HTTP layer and reuse the HTTP caching semantics and should not be
> defined at the HTML level because it deals with a transport-layer issue.
>
>
>
> > Until you can show mathematical evidence otherwise,
> > we can safely assume that collisions for SHA-2 (512-bit)
> > cannot presently be found through chance or malicious
> > intent. There doesn't seem to be much hope for this
> > happening anytime soon, either.
>
> The security issue of your proposal is not the SHA-2 hash (yet) but the
> fact that you load a resource without asking any server the permission to
> do so!
>
>
>
> > Provide specific attack scenarios for your hand-wavy
> > references to XSS, or stop generating FUD. Assuming
> > a secure hash function, I can't see how this could
> > possibly be useful for XSS.
>
> Someone could poison the cache for a file by mapping it to another very
> well-known one.
>
>     <script src="http://mybank/secure.js" hash="jquery-hash" />
>
> Or someone may expect that some file does not load if some security
> restrictions are not met, and those restrictions checked by the server
> could be bypassed by attributing an hash to the file when it's loaded the
> first time (the conditions are met) and the reusing that hash the next
> times when the conditions are not met.
>
>     On the secure website, with the hash changing every day:
>     <script src="http://mybank/check-security.php" onload="doSomething()"
> hash="hash-for-people-having-already-passed-the-test-today" />
>
>     The attacker use a computer that passes the tests, download the file,
> create a malicious image/script/iframe pointing to the file and associating
> the right hash to it
>     <script src="http://attackersite/check-security.php"
> hash="hash-for-people-having-already-passed-the-test-today" />
>
>     Now people visiting the site will have the security check disabled on
> the bank site because there exists some other site which already filled the
> super-cache with the right file, not doing any security check. This is
> clearly a case where the bank site owner did a mistake (he should have used
> a normal expire header and no hash) but it's not easy for him to understand
> that.
>
> Or if someone want to tricks a webpage, it could create an image whose src
> points to one image (scripts will recognize it as an image coming from the
> right server because the src will be right) but have an hash pointing to
> some other image the attacker downloaded before (ie the image will contain
> something that's different from that its src attribute says it does).
>
>
>
> Again, I continue to claim that if there's a reference to a script, an
> image, or whatever hosted on some server, the server MUST give his
> authorization before a resource is loaded, even if there's some
> identification process going on.
>
>
>
> > It should be noted that the hash of a resource and the resource
> > itself should be protected with equal vigor; the hash contains
> > 'part of the resource', and can be used to reconstruct the entire
> > resource (through a directed attack at shared visitors).
> >
> > This may not be obvious at first glance.
>
> I was about to say the exact same thing. It's not obvious, people will
> make mistakes. An hash is supposedly something you can leak. Examples: we
> don't store passwords in DB but (salted) hashes so that if someone takes
> over the DB, it can't recover the passwords. Here you're using hashes to
> recover the resource, people will definitely not like that.
>
Received on Monday, 12 August 2013 01:59:43 UTC