- From: Chris Drechsler <chris.drechsler@etit.tu-chemnitz.de>
- Date: Fri, 23 May 2014 10:06:04 +0200
- To: Guille -bisho- <bishillo@gmail.com>
- CC: ietf-http-wg@w3.org
Hi Guille, thank you for reading my draft and giving me feedback - this is very helpful! My answers are below: Am 21.05.2014 18:10, schrieb Guille -bisho-: > My 2 cents after reading the draft: > > Etag and If-None-Match already give a conditional get feature with > hash that does not need a reset of the connection. You are right, Etag + If-None-Match is an option but there are some drawbacks: 1) The Etag is only consistent within one domain. The SHA-256 hash value in the Cache-NT header identifies the transfered representation absolutely independent of the used URLs (and therefore across domains). 2) Caching via Etag + If-None-Match in [Part6] can only be used in combination with the URL. If content providers use varying URLs for one specific resource (e.g. due to load balancing/CDNs or session IDs in the query string) then the cache system stores several copies of the same resource. > Your proposal adds very little caching for bad practices. Nobody > should be using two urls for the same resource, if you need load > balancing between two cdns, you should be consistent and js1 always be > requested from site1 and js2 from site2. And this is being improved by > HTTP2 that will elimitate the need for sharding among domains to > overcome limits if parallel requests and head-of-line blocking. I agree to you, nobody should be using two URLs for the same resource. But reality looks different: 1) Due to load balancing/use of CDNs one specific resource is available via different URLs. Especially larger ISPs with connects to several Internet exchange points and/or several transit links are redirected to different server locations of the same resource. This can change within minutes due to changing BGP routes and/or due to load balance mechanisms of the content producer/CDN provider. 2) URLs can change for another reason: changing parameters in the query string. For example if content producers use personalization via session IDs or implement access mechanisms via parameters in the query string then the cache system would store several copies of the same content. Mostly caching is disabled in this use case by the content producer (e.g. via cache-control header). The proposed caching mechanism in my draft exchanges all headers of request and response messages so all information like parameters in the query string are exchanged. There is no need to disable caching. The SHA-256 hash value identifies resources independent of the used URL so varying URLs don't matter. > The Cache-NT header can only be applied within a domain, and even > there is risky. A malicious user could inject malicious content with a > Cache-NT header that matches other resource to poison the cache. Even > if intermediate caches check the hash, there is still pre-image > attacks, won't be hard to find a collision and append malicious code > to a js file. I don't see how the cache can be poisoned. Can you please explain it in more detail? I see the following: The used SHA-256 has a strong collision resistance so it's nearly impossible to find two different inputs that result in the same hash value. When the cache system receives a response with a specific hash value in the Cache-NT header for the first time it computes the SHA-256 value on the received representation in the body. If both hash value are equal the cache system stores a copy of the representation and uses it for following requests. If they are not equal then nothing is stored (but the response is still forwarded to the client). So the cache stores and uses only validated content. One security concern is that an origin server sends a hash value that does not fit to the representation in the body of the response message (by mistake or intention). Then the client will get a different body, if the cache system has an cache item which fits to the hash value in the Cache-NT header of the response from the origin server. I think this isn't a drawback of my proposed caching mechanism - I think this is a problem that we have already today: If the origin server is compromised (or intermediates in between) the clients would get malicious content already today. What do you think? > With Cache-NT you are only avoiding the transfer of the content, but > still incurring in the request to the backend server. Most of the > times that is the expensive part, and before you reset the connection > the backend would have probably sent you another 8 packets of > information (the recommended initcwnd is 9 by this days). If the > request should be cached, better get the provider to configure cache > properly to avoid doing the request altogether than this oportunistic > but dangerous way of avoiding some extra transfers over the wire. You are right, INITCWND can be 9 or larger and if a HTTP transfer is stopped then some KB will go over the wire. Therefore the proposed caching mechanism in my draft should only be applied for larger representations (significant larger than 20KB) e.g. like larger images or videos. Many content providers disable caching for several reasons: implementation of access mechanisms (e.g. via cookies or session IDs), user tracking, statistics (to evaluate the usage of a service or to account advertisement), transferring client specific information in the query string (e.g. like youtube does). They all want to get the client request and disable caching (as in [Part6] the client request terminates at the cache in case of a cache hit). My draft is building a bridge: all headers are exchanged and caching is possible. > Guille -bisho- > <bisho@freedreams.org|fb.com> > :wq > Thank you again for reading my draft and taking the time! I'm really looking forward for your answer. Chris
Received on Friday, 23 May 2014 08:06:38 UTC