Re: improved caching in HTTP: new draft

Am 23.05.2014 16:20, schrieb Poul-Henning Kamp:
> In message <537F52B0.5080209@etit.tu-chemnitz.de>, Chris Drechsler writes:
>
>>> It's also not enough I belive, you may need to stick some of the
>>> HTTP headers into the hash too, to get the expected behaviour.
>>>
>>> Transfer-encoding, Content-type ?
>>
>> Why do you believe in this - can you explain it in more detail?
>
> The way HTTP is defined, the headers are an unholy mix of transport
> encapsulation and object metadata.  Some headers are both.
>
> Although I have yet to see any non-contrived examples, there is
> absolutely nothing preventing the exact same body from having
> two entirely different meanings, depending on the headers.
>
> For instance:
> 	Content-Encoding: gzip
> 	Content-Type: text/html
> 	<gzip'ed body>
> vs.
> 	Content-Type: application/x-gzip
> 	<exact same gzip'ed body>
> 	
> Are two very different responses for any sensible client, yet your
> proposal would deliver either one for the other.

In the first case the input is for example an html document and the 
server computes the SHA256 value before the document is compressed 
during the process of sending. The client decompresses it and gets the 
html document and displays it (see RFC2616 14.11).

foo.html:
     Content-Encoding: gzip
     Content-Type: text/html
     Cache-NT: sha256=AAAAAA....
     <gzip'ed body>

In the second case the input is an already compressed file, e.g. 
foo.html.gz. It will not be compressed again by the server during the 
process of sending (no Content-Encoding is applied). The SHA256 value is 
different to the former case:

foo.html.gz:
     Content-Type: application/x-gzip
     Cache-NT: sha256=BBBBBB....
     <same gzip'ed body>

The client should not decompress it (in the process of receiving) 
because there is no Content-Encoding header (see RFC2616 14.11). If the 
client does decompress the body then the Content-Type will be text/html 
afterwards and not application/x-gzip - so the identity of the media 
type get lost.

 From the point of the cache system two different SHA256 values mean two 
different cache items although they are bit-identical. This can be fixed 
internally in the cache system but is not in my focus. As compressed 
text/html documents are often very small (mean compression gain by 
roughly 75 %, see [1]) the Cache-NT header should not be applied. The 
focus is more on larger transfers.

Thank you for this pathological corner case - this helps me very much. 
Do you see other corner cases in the context of my draft?

> Just SHA256()'ing the body will not be enough to work with all the
> pathological corner-cases inherent in HTTP's lack of architecture.
>
> You could try to divine which headers you have to feed into the SHA256
> to unravel the mess.  I wouldn't bother,
>
>> As I see it: caching should/must ensure that the client will get exactly
>> what the origin server has sent.
>
> Yes, *including* some of the headers under *certain* conditions.

Let's figure out how many corner cases we have and how they influence 
the caching in my draft. At the moment I think there is no need to 
include some of the headers under certain conditions as there is no 
significant loss of caching performance due to not including them. The 
main subject is that the client will get the response the origin server 
has sent. I think this subject is always fulfilled by my draft.

>
>>> Case 2:  You allow the content owner to define what goes into the SHA256
>>> ------------------------------------------------------------------------
>>
>> No, I don't allow the content owner to define what goes into the SHA256.
>> It's clearly defined how the hash value should be computed.
>
> And that doesn't work, see above.
>
> And neither does case 2, as detailed in my previous email.
>
>
> If the HTTPbis working group had as goal to make HTTP work better
> than it does today, something like your proposal could have been
> accomodated by eliminating some past mistakes, most notably
> the mingling of transport/ metadata headers.
>
> But given the WG's laser-like focus on compatibility with the limited
> past, with almost no regard to the potentially infinite future,
> your proposal is impossible to make work, and would just add to the
> already overwhelming complexity of HTTP.
>
> Poul-Henning
>

Thanks again for your comments!

Chris

[1]
P. Destounis, J. Garofalakis, P. Kappos, J. Tzimas, (2001) "Measuring 
the mean Web page size and its compression to limit latency and improve 
download time", Internet Research, Vol. 11 Iss: 1, pp.10 - 17

Received on Tuesday, 27 May 2014 10:10:50 UTC