Re: Submitted new I-D: Cache Digests for HTTP/2 from Kazuho Oku on 2016-01-14 (ietf-http-wg@w3.org from January to March 2016)

From: Kazuho Oku <kazuhooku@gmail.com>
Date: Thu, 14 Jan 2016 18:14:03 +0900
To: Amos Jeffries <squid3@treenet.co.nz>
Cc: HTTP Working Group <ietf-http-wg@w3.org>
Message-ID: <CANatvzy1sq3xgvgfC5CzXQy_5fhaU1x4X34w32_KaGDGYyoNuw@mail.gmail.com>
2016-01-13 18:40 GMT+09:00 Amos Jeffries <squid3@treenet.co.nz>:
> On 13/01/2016 6:04 p.m., Kazuho Oku wrote:
>> 2016-01-13 8:17 GMT+09:00 Amos Jeffries <squid3@treenet.co.nz>:
>>> On 12/01/2016 2:20 p.m., Kazuho Oku wrote:
>>>> 2016-01-12 0:39 GMT+09:00 Ilya Grigorik:
>>>>> Glad to see this proposal!
>>>>>
>>>>> FWIW, another +1 for enabling this functionality via an HTTP header.
>>>>> Limiting it to h2 frames makes it effectively inaccessible to web developers
>>>>> that want to experiment with own cache management logic (via ServiceWorker,
>>>>> etc).
>>>>
>>>> Glad to hear from you.
>>>>
>>>> While it is possible to use an HTTP header to implement cache-digest
>>>> (and that is what we are doing now in H2O + ServiceWorker/cookie), I
>>>> believe it should ideally be implemented as an HTTP/2 header since:
>>>>
>>>> * including the digest value (the value changes as client receives
>>>> responses) in every HTTP request is a waste of bandwidth
>>>
>>> Bandwidth may or may not be a problem relative to the digest design and
>>> amount of compression applied by the protocol (eg. h2 dynamic table vs
>>> HTTP/1.1 repetition).
>>
>> That is generally true.
>>
>> However the specification will become complicated if we are to include
>> the cache digest in the headers, while achieving a good compression
>> ratio.
>>
>> To send cache digests using headers while achieving good compression
>> ratio with HPACK, we would need to split the digest into at least two
>> headers: a large value that changes infrequently, and small values
>> that change frequently.  A client will send those headers attached to
>> every HTTP request, a server will merge the header values into one and
>> decode the digest for every HTTP request.
>>
>> And even if we did include such headers in every HTTP request, it is
>> still better for a  server to maintain the client's cache state per
>> connection, since there is a time slot between when a server sends a
>> resource (so that it would get cached by the client), and when a
>> client is capable of notifying the server that it has actually cached
>> something.  For example, if a client requests `/a.html` and `/b.html`
>> (that both rely on `/style.css`), a server should push `/style.css`
>> only once.  To do so, the draft expects a server to maintain and
>> update the estimated cache digest of the client, connected over a
>> single TCP connection.
>>
>> To summarize, the draft utilizes the fact that HTTP/2 multiplexes HTTP
>> requests into a single, ordered stream to make things simple.
>> Considering the fact that we need to rely on HTTP/2 to push things
>> anyways (that is the primary target of the draft), I think that is a
>> reasonable trade-off.
>>
>>>> * cache state is an information that is bound to the connection, not
>>>> to a request
>>>
>>> You assume a browser endpoint cache.
>>>
>>> Intermediary caches are constructed from content flowing over multiple
>>> parallel connections. Potentially from multiple origins. Which makes it
>>> very likely to have changed between any two given requests to contain
>>> things that cannot be inferred by the server from those two requests.
>>
>> Do you consider that caching proxies should establish multiple
>> connections to upstream when using HTTP/2?
>
> Typical ISP intermedaries service on the order of 20K requests per
> second. Most of that traffic goes to just a handful of origins these
> days. So yes multiple connections to at least those origins are going to
> happen.

Thank you for the clarification!

That means that regardless of whether we use an H2 frame or an HTTP
header to represent the cache digest, we would have a trade-off
between: a) the cost of spending upstream bandwidth to update cache
digest vs. b) the cost of spending downstream bandwidth due to servers
pushing the same resource though different connections.

Would that be a practical issue for caching proxies implementing cache
digests?  If it is, it might be better for us to define how to make
delta updates to the cache digest maintained on the server side.  That
would help caching proxies update the digest more frequently
(ultimately every time the cache state changes), leading to less
unnecessary pushes.

> Amos



-- 
Kazuho Oku
Received on Thursday, 14 January 2016 09:14:44 UTC