Re: hpack huffman codes from Johnny Graettinger on 2014-06-10 (ietf-http-wg@w3.org from April to June 2014)

From: Johnny Graettinger <jgraettinger@chromium.org>
Date: Tue, 10 Jun 2014 15:46:40 -0400
To: Martin Thomson <martin.thomson@gmail.com>
Cc: Martin Nilsson <nilsson@opera.com>, HTTP Working Group <ietf-http-wg@w3.org>
Message-ID: <CAEn92TqOSEJTjytRknS-=ZBE0_S-xNiVokG_=6ik3EXxAU+P=w@mail.gmail.com>

All,

As discussed during the interim, Chrome has been running a trial to
estimate an optimal HPACK Huffman code. Briefly, the methodology used is to
maintain an LRU of HPACK delta encoders for visited origins, to process all
request and response headers (including those delivered via HTTP/1) through
an origin-specific encoder, and to aggregate character counts of literals
emitted by those encoders. This roughly approximates the behavior of an
HPACK encoder if the origin had switched to long-lived HTTP/2 connections.
The implementation is available for inspection [1], and I'm happy to answer
questions.

Reviewing the last month of data, a Huffman code constructed for the count
PDF compresses that same PDF about 2.1% better than the current code table,
after adjusting the code to remove unique lengths (75.75% vs 77.87% of
uncompressed size). The observed PDF, constructed code, and comparison with
the current code are available at [2].

A caveat of the result is that this trial is running only on our Canary &
Dev channels. It's possible the distribution is somewhat biased due to
these populations. Ideally it would include Stable channel samples as well,
but that wasn't feasible in the time available. That said, this is still a
large population and having also looked at a couple of smaller windows, the
distribution does appear to have converged quickly and to be internally
consistent over time.

I'm personally on the fence as to whether the degree of improvement
warrants updating the current code, given the limitations of the trial. A
reasonable position is that this simply provides confirmation the current
code is near-optimal.

[1]
https://code.google.com/p/chromium/codesearch#chromium/src/net/spdy/hpack_huffman_aggregator.h&sq=package:chromium
[2]
https://docs.google.com/a/chromium.org/spreadsheet/ccc?key=0Ao3snhnDWuTvdGJNckVoWGpMN0tobTRERmVLdV8zRWc#gid=0

cheers,
-johnny

On Tue, Jun 10, 2014 at 11:08 AM, Martin Thomson <martin.thomson@gmail.com>
wrote:

> It costs nothing to disable Huffman coding. Thus the table really isn't
> the issue. The actual problem here is the grammar of the various header
> fields, and the potential need to translate into HTTP/1.1.
>  On Jun 10, 2014 7:49 AM, "Martin Nilsson" <nilsson@opera.com> wrote:
>
>>
>> Regarding the process to validate the current hpack huffman codes against
>> a large set of real headers, I think there is a risk that we'll paint
>> ourselves into a corner dictated by how HTTP/1 looks like. As pointed out,
>> a lot of base64 or hex encoded headers greatly benefits from huffman
>> encoding. However, if we can carry binary data there is no point in having
>> the data encoded in the first place, and not something to train the code
>> table for. These headers will change over time, because even if they are
>> taken into consideration for the huffman table, it is still more space
>> efficient to not encode them. The code lengths for the characters of the
>> other headers might suffer though.
>>
>> /Martin Nilsson
>>
>> --
>> Using Opera's revolutionary email client: http://www.opera.com/mail/
>>
>>

Received on Tuesday, 10 June 2014 19:47:08 UTC