Re: hpack huffman codes from Roberto Peon on 2014-06-10 (ietf-http-wg@w3.org from April to June 2014)

From: Roberto Peon <grmocg@gmail.com>
Date: Tue, 10 Jun 2014 14:02:29 -0700
To: Mike Bishop <Michael.Bishop@microsoft.com>
Cc: Martin Thomson <martin.thomson@gmail.com>, Johnny Graettinger <jgraettinger@chromium.org>, Martin Nilsson <nilsson@opera.com>, HTTP Working Group <ietf-http-wg@w3.org>
Message-ID: <CAP+FsNfVTOySnPhVh98_18GLJghCnMEb1GAmq3Eq8obc4W8moA@mail.gmail.com>

IIRC, when I created the code to generate the current table, it was done
using the same methodology as Johnny's-- a character is counted if it would
have been emitted in a literal, and not otherwise.
-=R


On Tue, Jun 10, 2014 at 1:40 PM, Mike Bishop <Michael.Bishop@microsoft.com>
wrote:

> Yes, thanks for this, Johnny!
>
> Given that the difference is slight (less than 3% better), we would
> support accepting this as confirmation that the current code is good and
> leave things as they are, unless broader deployment shows that the Huffman
> table is further off-base for common headers than this indicates.
>
> Martin, I'm going to throw out a hypothesis that probably can't be
> confirmed without changing the deployed code....  The original table was
> based on a sampling of HTTP headers, while this is based on a sample of
> *post-HPACK* literals.  That means headers which get optimized out by HPACK
> (header table, reference set) don't get counted every time in Johnny's
> model, where they probably did in the original analysis.  The less-steep
> highest-frequency characters (particularly 0/1) may be things that show up
> frequently but don't change from request to request over the lifetime of a
> connection.
>
> -----Original Message-----
> From: Martin Thomson [mailto:martin.thomson@gmail.com]
> Sent: Tuesday, June 10, 2014 1:28 PM
> To: Johnny Graettinger
> Cc: Martin Nilsson; HTTP Working Group
> Subject: Re: hpack huffman codes
>
> On 10 June 2014 12:46, Johnny Graettinger <jgraettinger@chromium.org>
> wrote:
> > I'm personally on the fence as to whether the degree of improvement
> > warrants updating the current code, given the limitations of the
> > trial. A reasonable position is that this simply provides confirmation
> > the current code is near-optimal.
>
> Thanks for doing this Johnny.  This would seem to validate the original
> table, though there are some interesting differences in lengths.  It seems
> like the new numbers show less of a steep drop from the high frequency
> 0/1/2 characters and a slightly more gradual roll off in the mid range.
>
> My little Monte Carlo estimation rig seems to produce better results with
> common character set restrictions (ignore percentages):
>
> ~/code/bhpack master
> $ node compare.js 1000000 40
> 1000000 random sequences of lengths:     40
> Average sizes: min/ave/max (size compared against Base64+Huffman)
> Raw Huffman coding (invalid):         62/95.944842/123 (203.71%)
> Base 64 (no Huffman):             54/54/54 (114.65%)
> Base 64 with Huffman:             41/47.099388/54 (100%)
> bhpack cookie with Huffman:         40/46.334863/56 (98.38%)
> bhpack URI safe with Huffman:         40/45.126209/52 (95.81%)
> bhpack URI query with Huffman:         40/45.200022/53 (95.97%)
>
> ~/code/bhpack chromium-test
> $ node compare.js 1000000 40
> 1000000 random sequences of lengths:     40
> Average sizes: min/ave/max (size compared against Base64+Huffman)
> Raw Huffman coding (invalid):         60/91.417749/123 (209.74%)
> Base 64 (no Huffman):             54/54/54 (123.89%)
> Base 64 with Huffman:             40/43.587016/47 (100%)
> bhpack cookie with Huffman:         39/44.91345/56 (103.04%)
> bhpack URI safe with Huffman:         40/43.424513/48 (99.63%)
> bhpack URI query with Huffman:         39/43.268/49 (99.27%)
>
>

Received on Tuesday, 10 June 2014 21:02:56 UTC