Re: New Version Notification for draft-vkrasnov-h2-compression-dictionaries-01.txt from Jyrki Alakuijala on 2016-11-02 (ietf-http-wg@w3.org from October to December 2016)

From: Jyrki Alakuijala <jyrki@google.com>
Date: Wed, 2 Nov 2016 20:59:03 +0100
To: Vlad Krasnov <vlad@cloudflare.com>
Cc: HTTP Working Group <ietf-http-wg@w3.org>
Message-ID: <CAPapA7TNeMTaPz7SN_0bmHx7G9a8V9c6eD=4LcRQj+G2XPQrGQ@mail.gmail.com>

Brotli has two separate ways of using a static dictionary. The first way is
the traditional way that zlib supports. The current custom dictionary
interface in brotli supports this method.

The second way is denser. It allows for every pair of (length, distance) to
point to a unique dictionary sequence. Because of this, dictionary
sequences that point to length N strings would save log2(N) bits in
distance specification in comparison to traditional dictionaries.

Further, second way of coding dictionaries allows for simple
transformations of the dictionary, i.e., primitive grammar to be defined.

We didn't add a user-programmable interface to define these yet in
libbrotli, but the second way for static dictionaries is already used for
the internal static dictionary. Our experience with it is that it is
significantly more powerful in comparison than a simple sequence of bytes.

Ideally, when used with brotli, a dictionary sharing mechanism would allow
for both of these to be used, and to allow for primitive grammar support,
too. The primitive grammar support allows for a small dictionary of
prefixes and suffixes, for example the current prefix-suffix-dictionary is
208 bytes -- the \0s here are separators:

kPrefixSuffix :=
    "\0 \0, \0 of the \0 of \0s \0.\0 and \0 in \0\"\0 to \0\">\0\n\0.
\0]\0"
    " for \0 a \0 that \0\'\0 with \0 from \0 by \0(\0. The \0 on \0 as \0"
    " is \0ing \0\n\t\0:\0ed \0=\"\0 at \0ly \0,\0=\'\0.com/\0. This \0"
    " not \0er \0al \0ful \0ive \0less \0est \0ize \0\xc2\xa0\0ous "

In addition to this, there is a combination table that combines a prefix +
middle-out-transform + suffix (choosing from 21 pre-programmed 'middle-out'
transforms that operate on the dictionary entries).

These "grammar tables" are less than 1 kB in size, but give about 1.5 % in
compression density, much higher win than elsewhere in the static
dictionary. It is possible that such grammar tables give more gain for
languages that are heavy on their prefix use (articles, prepositions, etc.).

The second way allows for about 3 % increase in compression density in
comparison to the first way, or alternatively one can reach to same
compression density by using smaller dictionaries (possibly about half the
size).

Received on Wednesday, 2 November 2016 19:59:36 UTC