Re: Call for Adoption: Cache Digests for HTTP/2 from Kazuho Oku on 2016-07-11 (ietf-http-wg@w3.org from July to September 2016)

From: Kazuho Oku <kazuhooku@gmail.com>
Date: Tue, 12 Jul 2016 00:18:13 +0900
To: Alcides Viamontes E <alcidesv@zunzun.se>
Cc: HTTP Working Group <ietf-http-wg@w3.org>
Message-ID: <CANatvzxX6aAfMJA2cSc+T_P+Ur_NCHU93FHtJc4wTF5GdLvtoQ@mail.gmail.com>
2016-07-07 22:35 GMT+09:00 Alcides Viamontes E <alcidesv@zunzun.se>:
> Our report on cache digests for HTTP/2, partially funded by the Swedish
> Internet Development fund:
>
> https://if-report.shimmercat.com/dirhtml/
>
> The main content of the report are some numbers regarding per-site cache
> size, which can be used to estimate the size of the cache digest.  Hope it
> can be of some use.

Thank you very much for the report.

Your estimation of the size of the cache digests was very interesting.
I am glad to know that the size of the digest is expected to be small.

Reading your report, my understanding is that your model is based on
the expectation that once a user visits a website, the TCP connection
would be kept until the user leaves the website after viewing some
pages (therefore a user connects to a website _sporadically_, at the
moment when the browser does not have many responses cached for the
website).

The model sounds fine to me, but may I ask if you have collected any
data based on different models?

For example, do you have any estimation on how large a cache-digest
would be, when a user revisits a website just after the TCP connection
to the website is shut down?

I think estimation under such condition would represent the worst
case, and having such data would be a good complement to your research
that represents the average case.

> Best regards,
>
> ./Alcides.
>
>
> On Mon, Jun 27, 2016 at 3:20 PM, Amos Jeffries <squid3@treenet.co.nz> wrote:
>>
>> On 27/06/2016 8:21 p.m., Alcides Viamontes E wrote:
>> > I just came gain over these emails and there are a few things which I
>> > don't
>> > completely understand.
>> >
>> >
>> >> Also, since cache digests are scoped to the lifetime of the connection
>> >> they're sent within, they can't grow indefinitely, in practical terms.
>> >> And,
>> >> of course, servers can discard digests if they don't want to keep the
>> >> state.
>> >>
>> >
>> > Isn't the cache supposed to contain assets obtained during previous
>> > visits?
>> > In that case, wouldn't the cache digest contain assets that the browser
>> > obtained in a previous connection?
>> >
>>
>> If viewed that way, yes. However the digest itself is not the objects it
>> lists. It is supposed to be regenerated from the cache state at the
>> beginning of the newer connection. So objects which have been dropped
>> out of the cache since that older connection are not even considered for
>> the new digest.
>>
>>
>>
>> >
>> >>
>> >> Finally, clients aren't required to represent the complete state of
>> >> their
>> >> cache in a digest; they're allowed to choose what representations to
>> >> include, so (for example) a very large cache can use a heuristic to
>> >> include
>> >> only the most likely candidates in a digest, to limit its size.
>> >>
>> >>
>> >>> We turn on cache digests during development, and we have seen the
>> >>> cache
>> >> digests grow due to the accumulation of different versions of assets.
>> >> Because we are using a cookie implementation, we can have the server
>> >> reset
>> >> the cookie when it gets too big. Would that be possible  with the
>> >> current
>> >> draft?
>> >>
>> >> Yes; see the RESET flag.
>> >>
>> >
>> > I went to the draft to understand better the RESET flag. What is the
>> > typical scenario that this flag is intended for? I can see that RESET
>> > helps
>> > if something happens to the cache; then the browser can use the flag to
>> > tell the server. But is the browser supposed to clear the cache often in
>> > the middle of a connection? If not, how is this substantially better
>> > than
>> > just resetting the connection? .. Both Chrome and Firefox reset
>> > connections
>> > when the user clears the cache.
>>
>> Resetting the whole connection is a huge overhead. Frames with digest
>> should not really be frequent enough to warrant a whole connnection
>> close and re-open cycle just to optimize away a single bit per frame.
>>
>> Having the frame cleared when that huge overhead is being paid anyway
>> makes sense. But having the dependency the reverse way around would be a
>> major pain.
>>
>>
>> >
>> > Also, if the server idea of assets held by the browser is bigger than
>> > the
>> > assets actually held by the browser, the assets won't be pushed, but the
>> > browser can just request them "normally".
>> >
>> > What I was alluding before was close to having the RESET in the other
>> > direction, from server to browser, to ask the browser to clear its cache
>> > for the current origin. This could be good for a number of use cases,
>> > not
>> > least of them decreasing the chance of false positives in queries to the
>> > digest. But adding a mechanism for the server to clear the browser's
>> > cache
>> > may be a bit out of scope :-( ...
>> >
>>
>> I susect you are mistaking what "cache" is referring to. There are two
>> caches involved with these drafts.
>> * a cache of URL + objects. Stored on the browser.
>> * a cache of digest values. Stored on the server.
>>
>> The RESET flag on these frame is about sender controlling the cache of
>> digest values on the recipient. So it makes no sense to send it from the
>> agent which holds the cache, to the agent which does not.
>>
>> It can be used to indicate to the server that the browser objects cache
>> just got wiped for some unspecified and otherwise irrelevant reason.
>>
>> Amos
>>
>>
>
>
>
> --
> Alcides Viamontes
> www.shimmercat.com



-- 
Kazuho Oku
Received on Monday, 11 July 2016 15:18:44 UTC