Re: Submitted new I-D: Cache Digests for HTTP/2 from Kazuho Oku on 2016-01-13 (ietf-http-wg@w3.org from January to March 2016)

From: Kazuho Oku <kazuhooku@gmail.com>
Date: Wed, 13 Jan 2016 14:04:07 +0900
To: Amos Jeffries <squid3@treenet.co.nz>
Cc: HTTP Working Group <ietf-http-wg@w3.org>
Message-ID: <CANatvzyOnMSLHfXcDrGSjbtZi5nFX2e9_4tHOjmR2OqBWEYUcg@mail.gmail.com>

2016-01-13 8:17 GMT+09:00 Amos Jeffries <squid3@treenet.co.nz>:
> On 12/01/2016 2:20 p.m., Kazuho Oku wrote:
>> 2016-01-12 0:39 GMT+09:00 Ilya Grigorik:
>>> Glad to see this proposal!
>>>
>>> FWIW, another +1 for enabling this functionality via an HTTP header.
>>> Limiting it to h2 frames makes it effectively inaccessible to web developers
>>> that want to experiment with own cache management logic (via ServiceWorker,
>>> etc).
>>
>> Glad to hear from you.
>>
>> While it is possible to use an HTTP header to implement cache-digest
>> (and that is what we are doing now in H2O + ServiceWorker/cookie), I
>> believe it should ideally be implemented as an HTTP/2 header since:
>>
>> * including the digest value (the value changes as client receives
>> responses) in every HTTP request is a waste of bandwidth
>
> Bandwidth may or may not be a problem relative to the digest design and
> amount of compression applied by the protocol (eg. h2 dynamic table vs
> HTTP/1.1 repetition).

That is generally true.

However the specification will become complicated if we are to include
the cache digest in the headers, while achieving a good compression
ratio.

To send cache digests using headers while achieving good compression
ratio with HPACK, we would need to split the digest into at least two
headers: a large value that changes infrequently, and small values
that change frequently.  A client will send those headers attached to
every HTTP request, a server will merge the header values into one and
decode the digest for every HTTP request.

And even if we did include such headers in every HTTP request, it is
still better for a  server to maintain the client's cache state per
connection, since there is a time slot between when a server sends a
resource (so that it would get cached by the client), and when a
client is capable of notifying the server that it has actually cached
something.  For example, if a client requests `/a.html` and `/b.html`
(that both rely on `/style.css`), a server should push `/style.css`
only once.  To do so, the draft expects a server to maintain and
update the estimated cache digest of the client, connected over a
single TCP connection.

To summarize, the draft utilizes the fact that HTTP/2 multiplexes HTTP
requests into a single, ordered stream to make things simple.
Considering the fact that we need to rely on HTTP/2 to push things
anyways (that is the primary target of the draft), I think that is a
reasonable trade-off.

>> * cache state is an information that is bound to the connection, not
>> to a request
>
> You assume a browser endpoint cache.
>
> Intermediary caches are constructed from content flowing over multiple
> parallel connections. Potentially from multiple origins. Which makes it
> very likely to have changed between any two given requests to contain
> things that cannot be inferred by the server from those two requests.

Do you consider that caching proxies should establish multiple
connections to upstream when using HTTP/2?

> This type of problem is also more likely to happen in the presence of
> domain sharding. Where the temporal locality of the index request is
> different from the 100's of content requests.
>
> ALTSVC may also make similar things happen with browser caches.
>
> Amos
>

-- 
Kazuho Oku

Received on Wednesday, 13 January 2016 05:04:38 UTC