- From: Felix Handte <felixh@fb.com>
- Date: Wed, 5 Sep 2018 13:01:23 -0700
- To: Benjamin Kaduk <bkaduk@akamai.com>, Felix Handte <felixh@fb.com>
- CC: Mark Nottingham <mnot@mnot.net>, Jyrki Alakuijala <jyrki@google.com>, Charles McCathie-Neville <chaals@yandex-team.ru>, Evgenii Kliuchnikov <eustas@google.com>, Vlad Krasnov <vlad@cloudflare.com>, Nick Terrell <terrelln@fb.com>, Yann Collet <cyan@fb.com>, HTTP Working Group <ietf-http-wg@w3.org>
On 09/01/2018 09:05 PM, Benjamin Kaduk wrote:> One topic that came up during IESG review of draft-kucherawy-dispatch-zstd was > whether/when third-party or standard dictionaries would become available and how > dictionary IDs would be assigned for those cases (since at present, IIUC, the > dictionary IDs would need to be pre-negotiated between the two parties). No > IANA registry was created at that time, but with a 4-byte dictionary identifier space > to work with, it seems like there might be space to create a registry for dictionary > IDs (including private use space, of course), and just publishing well-known > dictionaries. Yes, we continue to think about whether and how to produce a standard set of dictionaries for public consumption. Zstandard reserves dictionary IDs 1-32767 for that purpose. Dictionaries become more effective when they are targeted towards / trained on a narrower set of content. A solution that lets site operators build and use their dictionaries will enable sufficiently motivated parties to achieve the best possible compression. Zstandard provides tooling for that purpose, allowing users to easily train and use their own dictionaries. OTOH, distributing and storing dictionaries is not without cost, and so a great number of highly targeted dictionaries introduces its own inefficiencies. So even in a world with custom dictionaries, we think that a standard set of dictionaries probably has utility. Site operators who don't expect enough repeat traffic to amortize the cost of distributing a custom dictionary, or who don't want to expend the effort of building custom dictionaries, could simply use them. And a standard set of dictionaries would certainly enable shipping "batteries-included" plugins to HTTP servers, lowering the barrier to use. Building a standard set of dictionaries is not trivial, though. We recently performed experiments training a set of dictionaries on a dataset from the HTTP Archive[1]. We found that performance degrades significantly over time. A dictionary trained on 2016 traffic and applied to 2018 traffic performs worse than a 2018 dictionary does on 2018 traffic (anywhere from one to five percent compression ratio loss per year). So ideally, even in the context of a standard set of dictionaries, we would find a way to update or introduce new dictionaries as time goes on. - Felix [1] https://httparchive.org/
Received on Wednesday, 5 September 2018 20:02:22 UTC