Re: improved caching in HTTP: new draft

Hi Amos,

thank you for your comments! They are very useful for me! Below are my 
answers:

Am 19.05.2014 16:34, schrieb Amos Jeffries:
> On 19/05/2014 7:13 p.m., Chris Drechsler wrote:
>> Dear editors of [Part6],
>> dear working group members,
>>
>> I've written a draft about an improved caching mechanism in HTTP
>> especially for shared caches (to improve caching efficiency and reduce
>> costly Interdomain traffic). It can deal with personalization (e.g.
>> cookies, session IDs in query strings) and varying URLs due to load
>> balancing or the use of CDNs. The caching mechanism ensures that all
>> headers (request and response messages) are exchanged between origin
>> server and client even if the real content is coming from a cache.
>>
>> The draft is available under the following URL:
>>
>> http://tools.ietf.org/id/draft-drechsler-httpbis-improved-caching-00.txt
>>
>> I kindly request you for comments - thank you!.
>
>
> * The introduction states several abuses and deliberate non-use of
> HTTP/1.1 features as the reasons for this proposal.

In a way I agree: varying URLs due to load balancing/CDNs or the use of 
session IDs in query strings are some kind of abuses. I think 
personalization via cookies is not. On the other hand they are widely 
used in the Internet and ISPs have to deal with it. And especially 
caching is suffering from it.

>
>   Which would not usually be bad, but it is actually simpler for the few
> problematic systems to start using existing DNS and HTTP features
> properly than it is for the entire existing software environment to be
> re-implemented to support this proposed mechanism.

How would you properly solve it via DNS and HTTP features to ensure that 
a representation from URL A is the same as from URL B? I think it is 
very difficult because you don't know how all kinds of CDNs work internally.

>
>
> * Section 2.1 appears to be proposing a new header. But what does "NT" mean?
>   The use of this header seems wsimilar to an extension of the RFC 3230
> message digest mechanism.

I looked for a header field name different to cache-control and named it 
that way - NT means new technology. And right, it is similar to RFC 3230 
but there is no extra RTT due to negotiation of the hash algorithm.

>
>
> * Section 2.2 things get really weird:
>   - 2.2.1 is requiring mandatory disabling of all conditional request
> mechanisms in HTTPbis part4.

Maybe my formulation is not clear. If cache-control is used by the 
content producer then [Part6] caching should be applied and conditional 
requests make sense and should not be disabled.

If the content producer uses CDNs and/or personalization and wants to 
benefit from caching then he can make use of the new caching mechanism 
(and would omit the cache-control header). Then there is no need of 
conditional requests because the cache system will abort the transfer if 
the hash values match.

It's always controlled by the content producer.

>   - 2.2.2 is requiring mandatory TCP connection termination after every
> duplicate response. Effectively removing all benefits HTTP/1.1 pipeline
> and persistent connections bring. It does this in order to replicate the
> 304 conditional response using just 200 status code + TCP termination.

You are right. In reality persistent connections and pipelining are not 
so widely used (see [1]). Especially pipelining is only used in roughly 
6% of all connections. When it is used then for relatively small 
contents. The caching mechanism in my draft should be applied for large 
contents (significant larger than 20KB) like bigger images or videos for 
example.

>   - 2.2.2 also places several impossible requirements on intermediaries:
>    1) to re-open a just closed TCP connection, presumably without delays
> which would violate TCP TIME_WAIT requirements.

I've written re-open but thats not the right word - I meant open a new 
connection (with a new port) - I will fix it in the draft. And your are 
absolutely right: re-open the same TCP connection would violate TCP 
TIME_WAIT requirements.

>    2) to decode the Content-Encoding of responses to compare SHA-256
> values. Even if the response was a 206 status with byte range from
> within an encoding such as gzip.

No, you can not compare the hash value until you have the full (range) 
of the transfered representation. So the cache system would store the 
byte range from the 206 response and would wait until other responses 
bring the missing parts. After that the cache can compute the hash value 
and if it is correct (in comparison to the Cache-NT fields of the former 
responses) then the cache would store it and would use it for following 
requests. The computation should/can be done during low load (not in 
busy hour).

>
>
> In summary, the proposed feature completely disables the two most
> valuable cache features of HTTP/1.1 and replaces them with an equivalent
> process requiring mandatory use of the worst behaviours from HTTP/1.0.

In a way I agree. For HTTP/1.0 and HTTP/1.1 my draft is a workaround. 
What I would need there is something like a STOP-sending-this-response 
request. In general the new caching mechanism is more useful for larger 
representations and they are seldom pipelined.

What about HTTP2? As each HTTP request-response exchange is assigned to 
a single stream no side effects will arise - in my opinion. What do you 
think, can you see side effects?

>
> Amos
>
>

Thanks a lot,
Chris


[1]
Fabian Schneider, Bernhard Ager, Gregor Maier, Anja Feldmann, and Steve 
Uhlig. 2012. Pitfalls in HTTP traffic measurements and analysis. In 
Proceedings of the 13th international conference on Passive and Active 
Measurement (PAM'12), Nina Taft and Fabio Ricciato (Eds.). 
Springer-Verlag, Berlin, Heidelberg, 242-251.

Received on Tuesday, 20 May 2014 11:25:52 UTC