- From: Alcides Viamontes E <alcidesv@shimmercat.com>
- Date: Sat, 20 Aug 2016 12:04:23 +0200
- To: Mark Nottingham <mnot@mnot.net>
- Cc: HTTP Working Group <ietf-http-wg@w3.org>, Kazuho Oku <kazuhooku@gmail.com>
- Message-ID: <CAAMqGzZJ0rD_2DkruvHNu1vwVs2ERngcun9jGnSD22dq3eY07g@mail.gmail.com>
Just a quick few opinions below On Sat, Aug 20, 2016 at 2:59 AM, Mark Nottingham <mnot@mnot.net> wrote: > [ with my "cache digest co-author" hat on ] > > In discussions about Cache Digest, one of the questions that came up was > whether or not it was necessary to use a digest mechanism (e.g., Bloom > filter, Golumb compressed set), or whether or not we could just send a list > of the cached representations. > > Curious about this, I whipped up a script to parse the contents of > Chrome's cache, to get some idea as to how many cached responses per origin > a browser keeps. > > See: > https://gist.github.com/mnot/793fcfb0d003e87ea7e8035c43eafdb9 > and responses to: > https://twitter.com/mnot/status/766542805980155905 > > The caveats around this are too numerous to cover, but to mention a few: > - this is just anecdata, and a very small sample at that > - it's skewed towards: > a) people who follow me on Twitter; > b) people who use Chrome; > c) people who can easily run a Python program (leaving most > Windows users out) > - it includes both fresh and stale cached responses > - it assumes that the Chrome URL gives the complete and correct state of > the cache > > Looking at the responses (five so far) and keeping that in mind, a few > observations: > > 1. Unsurprisingly, the number of cached responses per origin appears to > follow (roughly) a Zipf curve, like so many other Web stats do > 2. Origins with tens of cached responses appear to be very common > 3. Origins with hundreds of cached responses appear to be not uncommon at > all > 4. Origins with thousands of cached responses are encountered > > More data is, of course, welcome. > > My early take-away is that if we design a mechanism where the cached > responses are enumerated, instead of having the entire cache's contents for > the origin digested, there needs to be some mechanism whereby the most > relevant cached responses are selected. > I would very much like a selection mechanism even with cache digests. In my experience with cache-digests-as-a-cookie, the digest size is far smaller than most authentication cookies, but there may be scenarios where people will want more control on the number of bytes spent on a digest. > The most likely time to do that is when the responses themselves are first > cached; e.g., with a cache-control extension. I think the challenges that > such a scheme would face are: > > a) Keeping the advertisement concise (because it should fit into a > navigation request, without bumping into another RT of congestion window) > b) Being able to express the presence of a larger number of URLs (since > one of the effects of HTTP/2 is atomisation into a larger number of smaller > resources), with bits of state like "fresh/stale" attached > c) Being manageable for the origin (since they'll effectively have to > predict what URLs are important to know about ahead of time, and in the > face of site changes) > > To me, this makes CD more attractive, because we have more confidence that > (a) and (b) are in hand, and (c) isn't a worry because the entire origin's > cache state will be sent. Provided that the security/privacy issues are in > hand, and that it's reasonably implementable by clients, I think CD also > has a better chance of success because it decouples the sending of the > cache state from its use, making it easier to reuse the data on the server > side without close client coordination. > > So, I think the things that we do need to work on in CD are: > > 1) Choosing a more efficient hash algorithm and assuring that it's > reasonable to implement in browsers > 2) Refining the flags / operation models so that it's as simple and > sensible as possible (but we need feedback on how clients want to send it) > 3) Defining a way for origins to opt into getting CD, rather than always > sending it. > > Thumbs up for all of this! Although I see 1) as difficult to achieve in practice, GCS is already quite good. -- Alcides Viamontes E. Zunzun AB (+46) 722294542 (www.shimmercat.com is a property of Zunzun AB)
Received on Saturday, 20 August 2016 10:04:52 UTC