Re: Submitted new I-D: Cache Digests for HTTP/2 from Kazuho Oku on 2016-01-12 (ietf-http-wg@w3.org from January to March 2016)

From: Kazuho Oku <kazuhooku@gmail.com>
Date: Tue, 12 Jan 2016 10:50:22 +0900
To: Alex Rousskov <rousskov@measurement-factory.com>
Cc: HTTP Working Group <ietf-http-wg@w3.org>
Message-ID: <CANatvzwuKHpXFHFWY4RX_WnpFC7MTz2-WFc54=gFooDzHHnmVA@mail.gmail.com>
2016-01-09 16:27 GMT+09:00 Alex Rousskov <rousskov@measurement-factory.com>:
> On 01/08/2016 11:27 PM, Kazuho Oku wrote:
>
>> If we are to generalize the proposal to support other purposes such as
>> exchanging cache states between proxies, I think we should also
>> consider of defining a way for sending a digest divided into multiple
>> HTTP/2 frames in case the size of the digest exceeds 16KB, in addition
>> to providing space to define which encoding is being used.
>
> This is an important caveat that I have missed! Squid Cache Digests are
> often many megabytes in size... Perhaps the Draft should be renamed to
> "Small Cache Digests for HTTP/2" to emphasize that the proposed
> mechanism is not applicable to large caches?
>
>
>> Or if there is no immediate demand to use an encoding other than
>> Golomb-coded sets for sending a small-sized digest, then we can add a
>> sentence stating that:
>>
>> * sender of a CACHE_DIGEST frame must set its flags to zero
>> * receiver of the frame must ignore if its flags are not set to zero
>>
>> , and if such demand arises, define new flags to extend the semantics.
>
> It feels wrong to use frame flags to specify digest _encoding_, but
> perhaps that is appropriate in HTTP/2 context.

Sorry for my wording, what I wanted to say is that we can add a flag
to identify that the payload is using a new format, that includes the
ID of the encoding together with other things that are required to
support such extension.

>> Also, Golomb-coded sets
>> will be the only practical choice, the size of the digest will become
>> significantly larger if Bloom filter was chosen (in case false
>> positive rate is set to 1/256, it will be about 8x as large).
>
> I would not limit the possibilities to Bloom filters and Golomb-coded
> sets. For example, I can imagine a client talking to a server with a
> small set of *known-a-priori* objects and using a small 1:1 bitmap to
> reliably represent the current client cache digest.
>
> You only need to "waste" an octet to open up support for other digest
> formats without changing the overall semantics of the "small cache
> digest" feature...

The question is if only including an octet and keeping other semantics
the same is fine to support other purposes.  To me the answer is
uncertain at this point, I am afraid if such octet will actually be
used.  But this is my thoughts only, I am open to making the spec.
more generic if the general tone is that I should.

>>>>    servers ought not
>>>>    expect frequent updates; instead, if they wish to continue to utilise
>>>>    the digest, they will need update it with responses sent to that
>>>>    client on the connection.
>
>>> Perhaps I am missing some important HTTP/2 caveats here, but how would
>>> an origin server identify "that client" when the "connection" is coming
>>> from a proxy and multiplexes responses to many user agents?
>
>> Proxies understanding the frame can simply transfer it to the upstream
>> server
>
> Yes, but how would an origin server identify "that client" when the
> "connection" is coming from a CACHE_DIGEST-aware proxy and multiplexes
> responses to many user agents served by that proxy? AFAICT, the server
> cannot know whether the responses sent "on the connection" are going to
> be cached by the proxy (and, hence, should not be pushed again) or are
> going to be forwarded to the user agent without proxy caching (and,
> hence, should be pushed again in case other user agents need them).
>
> IIRC, from terminology point of view, the proxy is the "client" in this
> context so there is no problem in the current Draft wording if that is
> what you meant. There may be a problem if, by "that client", you meant
> "that user agent" instead.
>
> Please note that I am _not_ saying that there is a protocol bug here. I
> am just noting that it is not clear what should happen when proxies
> multiplex streams from different user agents to the same origin server,
> and whether there are some specific strategies that caching and
> non-caching proxies should deploy to maximize the savings. There seems
> to be at least three cases to consider: CACHE_DIGEST-unaware proxies,
> aware caching proxies, and aware non-caching proxies.

Thank you for bringing up the case of a CACHE_DIGEST-aware proxy that
multiplexes requests from many clients.

As you pointed out, it is not sensible for a such proxy to simply
transfer the digest value sent from one client to the server.  What it
should do is:

* send the digest of proxy's cache to the server
* store the pushed resources within the proxy's cache
* when the server's response includes a `Link: rel=preload` header,
and if the resource specified for preload exists in the proxy's cache,
push that resource to the client

The performance may not be optimal when compared to a
CACHE_DIGEST-aware proxy that does not multiplex requests from
multiple clients, but it would be better than the case of a
CACHE_DIGEST-unaware proxy.

> Thank you,
>
> Alex.
>



-- 
Kazuho Oku
Received on Tuesday, 12 January 2016 01:50:53 UTC