Re: Submitted new I-D: Cache Digests for HTTP/2 from Alex Rousskov on 2016-01-09 (ietf-http-wg@w3.org from January to March 2016)

From: Alex Rousskov <rousskov@measurement-factory.com>
Date: Sat, 9 Jan 2016 00:27:30 -0700
To: ietf-http-wg@w3.org
Cc: Kazuho Oku <kazuhooku@gmail.com>
Message-ID: <5690B662.4070006@measurement-factory.com>
On 01/08/2016 11:27 PM, Kazuho Oku wrote:

> If we are to generalize the proposal to support other purposes such as
> exchanging cache states between proxies, I think we should also
> consider of defining a way for sending a digest divided into multiple
> HTTP/2 frames in case the size of the digest exceeds 16KB, in addition
> to providing space to define which encoding is being used.

This is an important caveat that I have missed! Squid Cache Digests are
often many megabytes in size... Perhaps the Draft should be renamed to
"Small Cache Digests for HTTP/2" to emphasize that the proposed
mechanism is not applicable to large caches?


> Or if there is no immediate demand to use an encoding other than
> Golomb-coded sets for sending a small-sized digest, then we can add a
> sentence stating that:
> 
> * sender of a CACHE_DIGEST frame must set its flags to zero
> * receiver of the frame must ignore if its flags are not set to zero
> 
> , and if such demand arises, define new flags to extend the semantics.

It feels wrong to use frame flags to specify digest _encoding_, but
perhaps that is appropriate in HTTP/2 context.


> Also, Golomb-coded sets
> will be the only practical choice, the size of the digest will become
> significantly larger if Bloom filter was chosen (in case false
> positive rate is set to 1/256, it will be about 8x as large).

I would not limit the possibilities to Bloom filters and Golomb-coded
sets. For example, I can imagine a client talking to a server with a
small set of *known-a-priori* objects and using a small 1:1 bitmap to
reliably represent the current client cache digest.

You only need to "waste" an octet to open up support for other digest
formats without changing the overall semantics of the "small cache
digest" feature...


>>>    servers ought not
>>>    expect frequent updates; instead, if they wish to continue to utilise
>>>    the digest, they will need update it with responses sent to that
>>>    client on the connection.

>> Perhaps I am missing some important HTTP/2 caveats here, but how would
>> an origin server identify "that client" when the "connection" is coming
>> from a proxy and multiplexes responses to many user agents?

> Proxies understanding the frame can simply transfer it to the upstream
> server

Yes, but how would an origin server identify "that client" when the
"connection" is coming from a CACHE_DIGEST-aware proxy and multiplexes
responses to many user agents served by that proxy? AFAICT, the server
cannot know whether the responses sent "on the connection" are going to
be cached by the proxy (and, hence, should not be pushed again) or are
going to be forwarded to the user agent without proxy caching (and,
hence, should be pushed again in case other user agents need them).

IIRC, from terminology point of view, the proxy is the "client" in this
context so there is no problem in the current Draft wording if that is
what you meant. There may be a problem if, by "that client", you meant
"that user agent" instead.

Please note that I am _not_ saying that there is a protocol bug here. I
am just noting that it is not clear what should happen when proxies
multiplex streams from different user agents to the same origin server,
and whether there are some specific strategies that caching and
non-caching proxies should deploy to maximize the savings. There seems
to be at least three cases to consider: CACHE_DIGEST-unaware proxies,
aware caching proxies, and aware non-caching proxies.


Thank you,

Alex.
Received on Saturday, 9 January 2016 07:28:13 UTC