Re: Submitted new I-D: Cache Digests for HTTP/2 from Kazuho Oku on 2016-02-02 (ietf-http-wg@w3.org from January to March 2016)

From: Kazuho Oku <kazuhooku@gmail.com>
Date: Tue, 2 Feb 2016 10:45:36 +0900
To: Eliezer Croitoru <eliezer@ngtech.co.il>
Cc: HTTP Working Group <ietf-http-wg@w3.org>
Message-ID: <CANatvzxEgfSPJoNT_MHe46SbJuMKu5zzdJWOFF+mr8ewaHh6Qg@mail.gmail.com>
Thank you for your feedback.

2016-01-27 11:45 GMT+09:00 Eliezer Croitoru <eliezer@ngtech.co.il>:
> I would like to join from this point to understand and ask the list since I
> couldn't follow and understand what was proposed and implemented exactly and
> I wanted to make sure I understand right.
>
> On 22/01/2016 19:47, Richard Bradbury wrote:
>>
>> Hello. The general thrust of this I-D seems like a useful optimisation
>> of HTTP/2 server push. It is wasteful to push a representation to a
>> client when the client already has a fresh copy cached. But the reverse
>> is equally true, I think...
>
>
> In some relation to the above quote I would like to ask:
> What is basically more important the client or the server resources?
>
> From what I understood the basic proposal was to add into every request the
> cache digest am I right? Is it still that way?

The original draft adds cache digest to every H2 connection.  Recent
discussion has been about conveying the digest within every HTTP
request as an HTTP header.

> Else then some privacy issues about sending the client cache-digest and TLS
> as being considered secure, there are other issues with it, for example
> mobile clients or metered WAN and LAN connections.
> If the client sends some KB(which can be more then couple cookies) on each
> request it means that for 20 requests the usage will be 10KB*20 <> 200KB
> which can become an issue for some but not all clients.

In case of HTTP/2, the overhead will be much less thanks to HPACK.
With HPACK, cache-digest that is sent repeatedly can typically be
compressed to one or two octets.  And unless HTTP/2 is being used,
there is practically no reason to send the cache digest over a public
network; only HTTP/2 supports push.

> Maybe for youtube that sends files\objects ranging from 3MB to 500MB++ it's
> not always an issue but sites that sends\pushes X*3MB images for the
> homepage to a mobile app is kind of an issue. If I'm not wrong this is one
> of the reasons that mod_pagespeed was designed, to somehow solve wrongly
> consumed bandwidth.
>
> From my point of view and understanding a cache-digest will probably require
> some per client "cache-digest dictionary" which can cause some issues to
> systems\servers with lots of clients\connections. The other side would be
> the ongoing re-validation and maintenance of this dictionaries.

That is a fair argument.  However, servers are already required to
maintain such dictionary for HTTP/2 (i.e. HPACK).

> It opens both the clients and the servers to some vulnerabilities. Also what
> would be the scope of the cache-digest, per connection? per request? per
> some client session id?
>
> And to polish out some aspects, what would happen if the server(which in
> many cases doesn't care about couple KB on the wire) will send a push offer
> for 20 objects and will be declined for each and every one of them with some
> kind of 304 by the client?
> - It will not require to open a new connection to the client and will use
> the same open connection.
> - It will not create a situation which the client resources(non-symmetric
> DSL clients) are being exhausted(imagine an office with 100+ PCs and 2 DSL
> 15MBit\1Mbit connection..)
> - It will simplify the server SW implementation and will prevent the need to
> store and look-up the client "cahce-digest dictionary" each and every time.
>
> And also if the html page contains the list of urls for objects that the
> client\browser can validate by itself someway, why do the client needs to be
> pushed some objects\content?(this is yet to be fully understood to me)
> I am looking for couple scenarios which will justify and clear out the need
> for such an implementation. Where is it needed else then advertisements?
> My basic understanding is that a cache-digest doesn't help for interactive
> applications or chats or real-time applications since the content there is
> always new or updated compared to the client. And compared to these a static
> files site will maybe require the client to send once the cache-digest but
> not on each and every request.
>
> I am almost convinced that:
> - Implementing a special request for an "update" request to a specific
> set\batch of files\objects will be much more efficient for both the client
> and the server then sending the cache-digest even once in a header.
> - Using some kind of push\offer 20 objects and being declined by the client
> would be much better then publishing the list of existing objects by the
> client.

Such approach is already defined as part of HTTP/2.

By using server-push defined in HTTP/2, it is possible for a server to
start sending resources that are expected to be used by the client.
However, the issue is that aggressively doing so wastes downstream
bandwidth, since without knowing what is already cached by the client
a server will repeatedly try to push the same objects (that are
rejected every time by the client after it receives the pushed
resource).

This draft is an attempt to fix the problem, by eliminating the
bandwidth you would waste if you push blindly, at the cost of some
upstream bandwidth.

> - For a client that doesn't care to send the header for 20 objects it would
> be pointless to not send if-modified-X requests for each and every one of
> these objects as an entity.
> - There are some security risks in the client sending a cache-digest in a
> specific scope which I would like to read about.
>
> Thanks,
> Eliezer
>
>



-- 
Kazuho Oku
Received on Tuesday, 2 February 2016 01:46:05 UTC