RE: hpack huffman codes from Mike Bishop on 2014-06-10 (ietf-http-wg@w3.org from April to June 2014)

From: Mike Bishop <Michael.Bishop@microsoft.com>
Date: Tue, 10 Jun 2014 20:40:08 +0000
To: Martin Thomson <martin.thomson@gmail.com>, Johnny Graettinger <jgraettinger@chromium.org>
CC: Martin Nilsson <nilsson@opera.com>, HTTP Working Group <ietf-http-wg@w3.org>
Message-ID: <0b9a9f5bcb4c4b05a983118a0e6c5e3c@BL2PR03MB132.namprd03.prod.outlook.com>

Yes, thanks for this, Johnny!

Given that the difference is slight (less than 3% better), we would support accepting this as confirmation that the current code is good and leave things as they are, unless broader deployment shows that the Huffman table is further off-base for common headers than this indicates.

Martin, I'm going to throw out a hypothesis that probably can't be confirmed without changing the deployed code....  The original table was based on a sampling of HTTP headers, while this is based on a sample of *post-HPACK* literals.  That means headers which get optimized out by HPACK (header table, reference set) don't get counted every time in Johnny's model, where they probably did in the original analysis.  The less-steep highest-frequency characters (particularly 0/1) may be things that show up frequently but don't change from request to request over the lifetime of a connection.

-----Original Message-----
From: Martin Thomson [mailto:martin.thomson@gmail.com] 
Sent: Tuesday, June 10, 2014 1:28 PM
To: Johnny Graettinger
Cc: Martin Nilsson; HTTP Working Group
Subject: Re: hpack huffman codes

On 10 June 2014 12:46, Johnny Graettinger <jgraettinger@chromium.org> wrote:
> I'm personally on the fence as to whether the degree of improvement 
> warrants updating the current code, given the limitations of the 
> trial. A reasonable position is that this simply provides confirmation 
> the current code is near-optimal.

Thanks for doing this Johnny.  This would seem to validate the original table, though there are some interesting differences in lengths.  It seems like the new numbers show less of a steep drop from the high frequency 0/1/2 characters and a slightly more gradual roll off in the mid range.

My little Monte Carlo estimation rig seems to produce better results with common character set restrictions (ignore percentages):

~/code/bhpack master
$ node compare.js 1000000 40
1000000 random sequences of lengths:     40
Average sizes: min/ave/max (size compared against Base64+Huffman)
Raw Huffman coding (invalid):         62/95.944842/123 (203.71%)
Base 64 (no Huffman):             54/54/54 (114.65%)
Base 64 with Huffman:             41/47.099388/54 (100%)
bhpack cookie with Huffman:         40/46.334863/56 (98.38%)
bhpack URI safe with Huffman:         40/45.126209/52 (95.81%)
bhpack URI query with Huffman:         40/45.200022/53 (95.97%)

~/code/bhpack chromium-test
$ node compare.js 1000000 40
1000000 random sequences of lengths:     40
Average sizes: min/ave/max (size compared against Base64+Huffman)
Raw Huffman coding (invalid):         60/91.417749/123 (209.74%)
Base 64 (no Huffman):             54/54/54 (123.89%)
Base 64 with Huffman:             40/43.587016/47 (100%)
bhpack cookie with Huffman:         39/44.91345/56 (103.04%)
bhpack URI safe with Huffman:         40/43.424513/48 (99.63%)
bhpack URI query with Huffman:         39/43.268/49 (99.27%)

Received on Tuesday, 10 June 2014 20:40:41 UTC