- From: Jyrki Alakuijala <jyrki@google.com>
- Date: Wed, 2 Nov 2016 20:59:03 +0100
- To: Vlad Krasnov <vlad@cloudflare.com>
- Cc: HTTP Working Group <ietf-http-wg@w3.org>
- Message-ID: <CAPapA7TNeMTaPz7SN_0bmHx7G9a8V9c6eD=4LcRQj+G2XPQrGQ@mail.gmail.com>
Brotli has two separate ways of using a static dictionary. The first way is the traditional way that zlib supports. The current custom dictionary interface in brotli supports this method. The second way is denser. It allows for every pair of (length, distance) to point to a unique dictionary sequence. Because of this, dictionary sequences that point to length N strings would save log2(N) bits in distance specification in comparison to traditional dictionaries. Further, second way of coding dictionaries allows for simple transformations of the dictionary, i.e., primitive grammar to be defined. We didn't add a user-programmable interface to define these yet in libbrotli, but the second way for static dictionaries is already used for the internal static dictionary. Our experience with it is that it is significantly more powerful in comparison than a simple sequence of bytes. Ideally, when used with brotli, a dictionary sharing mechanism would allow for both of these to be used, and to allow for primitive grammar support, too. The primitive grammar support allows for a small dictionary of prefixes and suffixes, for example the current prefix-suffix-dictionary is 208 bytes -- the \0s here are separators: kPrefixSuffix := "\0 \0, \0 of the \0 of \0s \0.\0 and \0 in \0\"\0 to \0\">\0\n\0. \0]\0" " for \0 a \0 that \0\'\0 with \0 from \0 by \0(\0. The \0 on \0 as \0" " is \0ing \0\n\t\0:\0ed \0=\"\0 at \0ly \0,\0=\'\0.com/\0. This \0" " not \0er \0al \0ful \0ive \0less \0est \0ize \0\xc2\xa0\0ous " In addition to this, there is a combination table that combines a prefix + middle-out-transform + suffix (choosing from 21 pre-programmed 'middle-out' transforms that operate on the dictionary entries). These "grammar tables" are less than 1 kB in size, but give about 1.5 % in compression density, much higher win than elsewhere in the static dictionary. It is possible that such grammar tables give more gain for languages that are heavy on their prefix use (articles, prepositions, etc.). The second way allows for about 3 % increase in compression density in comparison to the first way, or alternatively one can reach to same compression density by using smaller dictionaries (possibly about half the size).
Received on Wednesday, 2 November 2016 19:59:36 UTC