Re: [Feature Proposal] New attributes "library" and "version" on script tags from Felipe Nascimento de Moura on 2013-08-12 (public-html@w3.org from August 2013)

From: Felipe Nascimento de Moura <felipenmoura@gmail.com>
Date: Sun, 11 Aug 2013 23:24:52 -0300
To: "Nathanael D. Jones" <nathanael.jones@gmail.com>
Cc: François REMY <francois.remy.dev@outlook.com>, "HTML WG, (public-html@w3.org)" <public-html@w3.org>, Glenn Adams <glenn@skynav.com>, "Patrick H. Lauke" <redux@splintered.co.uk>
Message-ID: <CAJVBkVnj_2=0935YKk=-ZJzsUo4=jmTv4EsMeYO_fNZXDxDDFg@mail.gmail.com>
I really like this idea with the hashes.
It would also help in cases like a customized jqueryUi (lets say, two pages
that use exactly the same customization, what is not rare, would load from
this cache once the hash would be the same).
I just think browsers shouldn't store every script on the page with its
cache and hash...to avoid unnecessary process,calculations and store,  it
could only consider doing it to scripts with a defined hash attribute.
Actually, not only scripts,but anything external, including urls on
css(imagine an image to be used for icons as sprites, but used in mane
pages, as it happens with jqueryUi).
Em 11/08/2013 23:01, "Nathanael D. Jones" <nathanael.jones@gmail.com>
escreveu:

> I guess I wasn't clear enough.
>
> The browser NEVER uses or trusts the provided hash for WRITING to the
> cache; only for READING from the cache.
>
> A resource is written to the cache using the hash *locally calculated by
> the BROWSER *after downloading the resource from the URI.
>
> This eliminates the possibility of cache poisoning and all your points
> predicated upon that.
>
> On Sun, Aug 11, 2013 at 8:27 PM, François REMY <
> francois.remy.dev@outlook.com> wrote:
>
>> > You've evidently misinterpreted my
>> > proposal. I'm not suggesting any server-side or HTTP-level behavior
>>
>> I know. This is what I'm proposing. What you propose is basically to put
>> an ETAG in the HTML file, outside of the control of the server. What I
>> propose is to create a sort of "shared etag" so that the browser can send
>> some "etag of a file he already know" to avoid downloading a file that may
>> be the same. In the end, it’s up to the server to decide.
>>
>> If the reason you think the hash should be included inline is that it
>> avoids an RTT for the hash discovery, I disagree because HTTP/2.0 will
>> allow a website to reply to requests a browser didn't send as part of its
>> initial response, so the server could reply when a client connect with the
>> full HEAD of some script files he expect the client to have already, so
>> that the browser do not have to issue that request if the caching
>> conditions of if the hash matches [*].
>>
>> This system does not require you to modify your website in any way, it's
>> a purely server implementation trick. It also don't force you to modify
>> your static HTML files when you update some library to a new version.
>>
>>
>>
>> [*] You lose your RTT only if you store your script on another server
>> than yours (because your server cannot provide the HEAD responses with the
>> hashes in lieu of another server, that would be a security issue). However,
>> including scripts from another domain is already a bad practice because
>> you're giving away your security to some other website. The whole point of
>> this proposal is that you may spare download of the resource even if you
>> decide that for security or performance (dns and already open connection)
>> reason you do not want to rely on someone else CDN to hope to avoid the
>> download (now people do because some CDNs are popular so the time not spent
>> downloading the resource because it's shared across websites is higher than
>> the time lost for the few people not having the resource already and having
>> to open a new connection to the said CDN while loading the page).
>>
>> An example of this issue is people using Google Web Font store to
>> download their fonts because it means that everybody that use Google CDN
>> and the same font only need to download once. My proposal allows you to
>> store the file on your website but still get the benefits if anyone else
>> use the same file but hosted on its own server (or even the Google CDN)
>> because the hash will work alike the etag (your server will reply with a
>> 304 under HTTP1, send the hash as an aside reply under HTTP2).
>>
>>
>>
>> > whatsoever - the hash only appears once, in the HTML.
>> > It is never transmitted again.
>>
>> I can't help but repeat...
>>
>> I didn't say SHA-2 was insecure. My proposal use it as well. My issue
>> with this proposal is the fact it's not the role of HTML to deal with
>> transport-layer issues.
>>
>> My real point is: no browser should ever load a resource without asking
>> the server that hosts it the authorization to do so, and the metadata
>> (CORS,CSP,...) under which that server operates. I don't say SHA2 is
>> insecure, I say loading a resource only based on a claim (an attribute on
>> the <script> tag) which is potentially sent by someone else than the
>> website which hosts the resource and subject to XSS attacks is a bad idea.
>> Additionally, transmitting the hash of every file as part of the url (or an
>> attribute found in the HTML, or whatever) is a bad idea, too. The
>> identifier of a resource should never include content-based information
>> because there's always a risk for this information to be out-of-sync [EDIT]
>> and partial in the sense it doesn't cover all the http headers the server
>> may want to send [/EDIT].
>>
>> My high-order belief is that this "super-cache" feature should be built
>> in the HTTP layer and reuse the HTTP caching semantics and should not be
>> defined at the HTML level because it deals with a transport-layer issue.
>>
>>
>>
>> > Until you can show mathematical evidence otherwise,
>> > we can safely assume that collisions for SHA-2 (512-bit)
>> > cannot presently be found through chance or malicious
>> > intent. There doesn't seem to be much hope for this
>> > happening anytime soon, either.
>>
>> The security issue of your proposal is not the SHA-2 hash (yet) but the
>> fact that you load a resource without asking any server the permission to
>> do so!
>>
>>
>>
>> > Provide specific attack scenarios for your hand-wavy
>> > references to XSS, or stop generating FUD. Assuming
>> > a secure hash function, I can't see how this could
>> > possibly be useful for XSS.
>>
>> Someone could poison the cache for a file by mapping it to another very
>> well-known one.
>>
>>     <script src="http://mybank/secure.js" hash="jquery-hash" />
>>
>> Or someone may expect that some file does not load if some security
>> restrictions are not met, and those restrictions checked by the server
>> could be bypassed by attributing an hash to the file when it's loaded the
>> first time (the conditions are met) and the reusing that hash the next
>> times when the conditions are not met.
>>
>>     On the secure website, with the hash changing every day:
>>     <script src="http://mybank/check-security.php"
>> onload="doSomething()"
>> hash="hash-for-people-having-already-passed-the-test-today" />
>>
>>     The attacker use a computer that passes the tests, download the file,
>> create a malicious image/script/iframe pointing to the file and associating
>> the right hash to it
>>     <script src="http://attackersite/check-security.php"
>> hash="hash-for-people-having-already-passed-the-test-today" />
>>
>>     Now people visiting the site will have the security check disabled on
>> the bank site because there exists some other site which already filled the
>> super-cache with the right file, not doing any security check. This is
>> clearly a case where the bank site owner did a mistake (he should have used
>> a normal expire header and no hash) but it's not easy for him to understand
>> that.
>>
>> Or if someone want to tricks a webpage, it could create an image whose
>> src points to one image (scripts will recognize it as an image coming from
>> the right server because the src will be right) but have an hash pointing
>> to some other image the attacker downloaded before (ie the image will
>> contain something that's different from that its src attribute says it
>> does).
>>
>>
>>
>> Again, I continue to claim that if there's a reference to a script, an
>> image, or whatever hosted on some server, the server MUST give his
>> authorization before a resource is loaded, even if there's some
>> identification process going on.
>>
>>
>>
>> > It should be noted that the hash of a resource and the resource
>> > itself should be protected with equal vigor; the hash contains
>> > 'part of the resource', and can be used to reconstruct the entire
>> > resource (through a directed attack at shared visitors).
>> >
>> > This may not be obvious at first glance.
>>
>> I was about to say the exact same thing. It's not obvious, people will
>> make mistakes. An hash is supposedly something you can leak. Examples: we
>> don't store passwords in DB but (salted) hashes so that if someone takes
>> over the DB, it can't recover the passwords. Here you're using hashes to
>> recover the resource, people will definitely not like that.
>>
>
>
Received on Monday, 12 August 2013 02:25:19 UTC