- From: Alcides Viamontes E <alcidesv@zunzun.se>
- Date: Tue, 12 Jul 2016 10:27:34 +0200
- To: Amos Jeffries <squid3@treenet.co.nz>
- Cc: HTTP Working Group <ietf-http-wg@w3.org>
- Message-ID: <CAAMqGzbuCeMT=F8Z3zuOkJ38peFgx_dGOye1C4v6kpU-tg9fUA@mail.gmail.com>
Hi Kazuho and Amos, Thank you for taking some time to take a look to this report! Chrome is my default browser but I couldn't find an efficient way to get an inventory of its contents. So these numbers represent a day of typical browsing with Firefox. They were obtained by repeatedly scanning the "about:cache" page reported by firefox. Firefox uses two caches, an in-memory one and an in-disk one, and contents move between the two according to certain heuristic. Therefore, to get a good estimate it is better to scan the "about:cache" contents multiple times and unify the entries before counting them. All that procedure is at this file: https://github.com/shimmercat/internet_fonden_report/blob/master/source/AnalyzingAFirefoxCache.ipynb . Addressing Kazuho's question: - Yes, they span multiple TCP connections. The main question for us when doing this little study was if cache digests were useful in a direct browser-server scenario, with a typical user that would visit a site and return a few times. - Concretely addressing your question, Amos, these numbers come from the inventory that firefox reports on "about:cache", and as far as I know, it comes from what Firefox collects across multiple sessions. You can find the entire procedure in the github repository, particularly the samples were scanned at input 24 in this notebook: https://github.com/shimmercat/internet_fonden_report/blob/master/source/AnalyzingAFirefoxCache.ipynb . The data itself is also in the same repository, here : https://github.com/shimmercat/internet_fonden_report/blob/master/data.tar.gz We do reckon that covering more data would be helpful, but short of creating an addon and having some Firefox users to install it, we haven't come with any good ideas. One thing we learned is that cache contents vary a lot from site to site, that's why I think that we need more per-site mechanisms to control digest contents and their extent. In particular we recommend some scoping mechanism (besides using different origins), and a way to disable digests altogether, or even better, make digests opt-in for sites. Bests, ./Alcides. On Tue, Jul 12, 2016 at 10:02 AM, Amos Jeffries <squid3@treenet.co.nz> wrote: > On 8/07/2016 1:35 a.m., Alcides Viamontes E wrote: > > Our report on cache digests for HTTP/2, partially funded by the Swedish > > Internet Development fund: > > > > https://if-report.shimmercat.com/dirhtml/ > > > > The main content of the report are some numbers regarding per-site cache > > size, which can be used to estimate the size of the cache digest. Hope > it > > can be of some use. > > Thank you. > > Can you clarify for me whether these numbers are gathered from object > counts on a single page of the visited sites, from an entire session > browsing around those sites, or from a scan of all objects the site > produces that are cacheable? > > I am a little worried that they may be from the first two use-case and > thus under-representing how much a shared cache might accumulate per-site. > > Amos > > -- Alcides Viamontes www.shimmercat.com
Received on Tuesday, 12 July 2016 08:28:11 UTC