- From: Chris Drechsler <chris.drechsler@etit.tu-chemnitz.de>
- Date: Tue, 27 May 2014 12:10:17 +0200
- To: Poul-Henning Kamp <phk@phk.freebsd.dk>
- CC: ietf-http-wg@w3.org
Am 23.05.2014 16:20, schrieb Poul-Henning Kamp: > In message <537F52B0.5080209@etit.tu-chemnitz.de>, Chris Drechsler writes: > >>> It's also not enough I belive, you may need to stick some of the >>> HTTP headers into the hash too, to get the expected behaviour. >>> >>> Transfer-encoding, Content-type ? >> >> Why do you believe in this - can you explain it in more detail? > > The way HTTP is defined, the headers are an unholy mix of transport > encapsulation and object metadata. Some headers are both. > > Although I have yet to see any non-contrived examples, there is > absolutely nothing preventing the exact same body from having > two entirely different meanings, depending on the headers. > > For instance: > Content-Encoding: gzip > Content-Type: text/html > <gzip'ed body> > vs. > Content-Type: application/x-gzip > <exact same gzip'ed body> > > Are two very different responses for any sensible client, yet your > proposal would deliver either one for the other. In the first case the input is for example an html document and the server computes the SHA256 value before the document is compressed during the process of sending. The client decompresses it and gets the html document and displays it (see RFC2616 14.11). foo.html: Content-Encoding: gzip Content-Type: text/html Cache-NT: sha256=AAAAAA.... <gzip'ed body> In the second case the input is an already compressed file, e.g. foo.html.gz. It will not be compressed again by the server during the process of sending (no Content-Encoding is applied). The SHA256 value is different to the former case: foo.html.gz: Content-Type: application/x-gzip Cache-NT: sha256=BBBBBB.... <same gzip'ed body> The client should not decompress it (in the process of receiving) because there is no Content-Encoding header (see RFC2616 14.11). If the client does decompress the body then the Content-Type will be text/html afterwards and not application/x-gzip - so the identity of the media type get lost. From the point of the cache system two different SHA256 values mean two different cache items although they are bit-identical. This can be fixed internally in the cache system but is not in my focus. As compressed text/html documents are often very small (mean compression gain by roughly 75 %, see [1]) the Cache-NT header should not be applied. The focus is more on larger transfers. Thank you for this pathological corner case - this helps me very much. Do you see other corner cases in the context of my draft? > Just SHA256()'ing the body will not be enough to work with all the > pathological corner-cases inherent in HTTP's lack of architecture. > > You could try to divine which headers you have to feed into the SHA256 > to unravel the mess. I wouldn't bother, > >> As I see it: caching should/must ensure that the client will get exactly >> what the origin server has sent. > > Yes, *including* some of the headers under *certain* conditions. Let's figure out how many corner cases we have and how they influence the caching in my draft. At the moment I think there is no need to include some of the headers under certain conditions as there is no significant loss of caching performance due to not including them. The main subject is that the client will get the response the origin server has sent. I think this subject is always fulfilled by my draft. > >>> Case 2: You allow the content owner to define what goes into the SHA256 >>> ------------------------------------------------------------------------ >> >> No, I don't allow the content owner to define what goes into the SHA256. >> It's clearly defined how the hash value should be computed. > > And that doesn't work, see above. > > And neither does case 2, as detailed in my previous email. > > > If the HTTPbis working group had as goal to make HTTP work better > than it does today, something like your proposal could have been > accomodated by eliminating some past mistakes, most notably > the mingling of transport/ metadata headers. > > But given the WG's laser-like focus on compatibility with the limited > past, with almost no regard to the potentially infinite future, > your proposal is impossible to make work, and would just add to the > already overwhelming complexity of HTTP. > > Poul-Henning > Thanks again for your comments! Chris [1] P. Destounis, J. Garofalakis, P. Kappos, J. Tzimas, (2001) "Measuring the mean Web page size and its compression to limit latency and improve download time", Internet Research, Vol. 11 Iss: 1, pp.10 - 17
Received on Tuesday, 27 May 2014 10:10:50 UTC