Re: Submitted new I-D: Cache Digests for HTTP/2 from Amos Jeffries on 2016-01-13 (ietf-http-wg@w3.org from January to March 2016)

From: Amos Jeffries <squid3@treenet.co.nz>
Date: Wed, 13 Jan 2016 22:40:44 +1300
To: Kazuho Oku <kazuhooku@gmail.com>
Cc: HTTP Working Group <ietf-http-wg@w3.org>
Message-ID: <56961B9C.2030407@treenet.co.nz>

On 13/01/2016 6:04 p.m., Kazuho Oku wrote:
> 2016-01-13 8:17 GMT+09:00 Amos Jeffries <squid3@treenet.co.nz>:
>> On 12/01/2016 2:20 p.m., Kazuho Oku wrote:
>>> 2016-01-12 0:39 GMT+09:00 Ilya Grigorik:
>>>> Glad to see this proposal!
>>>>
>>>> FWIW, another +1 for enabling this functionality via an HTTP header.
>>>> Limiting it to h2 frames makes it effectively inaccessible to web developers
>>>> that want to experiment with own cache management logic (via ServiceWorker,
>>>> etc).
>>>
>>> Glad to hear from you.
>>>
>>> While it is possible to use an HTTP header to implement cache-digest
>>> (and that is what we are doing now in H2O + ServiceWorker/cookie), I
>>> believe it should ideally be implemented as an HTTP/2 header since:
>>>
>>> * including the digest value (the value changes as client receives
>>> responses) in every HTTP request is a waste of bandwidth
>>
>> Bandwidth may or may not be a problem relative to the digest design and
>> amount of compression applied by the protocol (eg. h2 dynamic table vs
>> HTTP/1.1 repetition).
> 
> That is generally true.
> 
> However the specification will become complicated if we are to include
> the cache digest in the headers, while achieving a good compression
> ratio.
> 
> To send cache digests using headers while achieving good compression
> ratio with HPACK, we would need to split the digest into at least two
> headers: a large value that changes infrequently, and small values
> that change frequently.  A client will send those headers attached to
> every HTTP request, a server will merge the header values into one and
> decode the digest for every HTTP request.
> 
> And even if we did include such headers in every HTTP request, it is
> still better for a  server to maintain the client's cache state per
> connection, since there is a time slot between when a server sends a
> resource (so that it would get cached by the client), and when a
> client is capable of notifying the server that it has actually cached
> something.  For example, if a client requests `/a.html` and `/b.html`
> (that both rely on `/style.css`), a server should push `/style.css`
> only once.  To do so, the draft expects a server to maintain and
> update the estimated cache digest of the client, connected over a
> single TCP connection.
> 
> To summarize, the draft utilizes the fact that HTTP/2 multiplexes HTTP
> requests into a single, ordered stream to make things simple.
> Considering the fact that we need to rely on HTTP/2 to push things
> anyways (that is the primary target of the draft), I think that is a
> reasonable trade-off.
> 
>>> * cache state is an information that is bound to the connection, not
>>> to a request
>>
>> You assume a browser endpoint cache.
>>
>> Intermediary caches are constructed from content flowing over multiple
>> parallel connections. Potentially from multiple origins. Which makes it
>> very likely to have changed between any two given requests to contain
>> things that cannot be inferred by the server from those two requests.
> 
> Do you consider that caching proxies should establish multiple
> connections to upstream when using HTTP/2?

Typical ISP intermedaries service on the order of 20K requests per
second. Most of that traffic goes to just a handful of origins these
days. So yes multiple connections to at least those origins are going to
happen.

Amos

Received on Wednesday, 13 January 2016 09:41:37 UTC