Re: bohe implementation for compression tests

Just continuing the investigation on various compression strategies. I
spent part of the day going through delta to make sure I understand it and
how it compares with bohe... I'll have some additional thoughts (and
concerns) with regards to that later on... The other half of the day has
been spent with various other bohe variations. Late in the after I hit upon
a particularly interesting variation... I've checked it in here:
https://github.com/jasnell/compression-test/tree/master/compressor/bohe4

This variation encodes headers and randomly assigns them to one of two
separate buckets. Those are then randomly ordered and compressed using two
separate compressor instances within the header block...

# +-------------+--------------------------+
# | num_headers |   block 1 len (4 bytes)  |
# +-------------+--------------------------+
# |        compressed header block 1       |
# +----------------------------+-----------+
# |  block 2 len (4 bytes)     |           |
# +----------------------------+           |
# |        compressed header block 2       |
# +----------------------------+-----------+

Because of the randomization, there is no way of determining in advance
which block any individual piece of data will land... making it much harder
for an attacker to use the compression ratio to reverse engineer any
particular value... every time the information is sent, it can be in a
different location. You can take the exact same request and encode it
multiple times and end up with a different message size every time (up to a
given limit, of course).

Some numbers from various test runs... note how bohe4 produces variable
compression ratios given identical input.

./compare_compressors.py -c bohe -c bohe4 -c delta -t
/Users/james/git/http_samples/mnot/wikipedia.org.har
408 req messages processed
             compressed | ratio min   max   std
req  bohe        10,784 | 0.13  0.05  0.65  0.07
req bohe4        13,496 | 0.16  0.05  0.69  0.08
req delta        16,725 | 0.20  0.04  0.72  0.09
req http1        84,388 | 1.00  1.00  1.00  0.00

408 res messages processed
             compressed | ratio min   max   std
res  bohe        19,882 | 0.25  0.06  0.58  0.10
res bohe4        20,610 | 0.26  0.09  0.63  0.09
res delta        24,523 | 0.30  0.04  0.60  0.12
res http1        80,613 | 1.00  1.00  1.00  0.00

./compare_compressors.py -c bohe -c bohe4 -c delta -t
/Users/james/git/http_samples/mnot/wikipedia.org.har
408 req messages processed
             compressed | ratio min   max   std
req  bohe        10,784 | 0.13  0.05  0.65  0.07
req bohe4        13,820 | 0.16  0.07  0.67  0.08
req delta        16,725 | 0.20  0.04  0.72  0.09
req http1        84,388 | 1.00  1.00  1.00  0.00

408 res messages processed
             compressed | ratio min   max   std
res  bohe        19,882 | 0.25  0.06  0.58  0.10
res bohe4        21,644 | 0.27  0.09  0.61  0.09
res delta        24,523 | 0.30  0.04  0.60  0.12
res http1        80,613 | 1.00  1.00  1.00  0.00

Again, this is just intended as fodder for discussion right now. I'll have
some comments specifically on delta encoding tomorrow sometime.

- James


On Thu, Jan 10, 2013 at 11:08 AM, James M Snell <jasnell@gmail.com> wrote:

> I have an initial bohe implementation for the compression tests... it's
> very preliminary and uses the same gzip compression as the current spdy3.
> I'm going to be playing around with the delta compression mechanism as well
> and see how much of an impact that has. Initial results are very promising
> but I haven't done much debugging yet. Just wanted folks to know that this
> work was underway...
>
> https://github.com/jasnell/compression-test/tree/master/compressor/bohe
>
> Some test runs....
>
> ./compare_compressors.py -c bohe -c spdy3 -c delta
> ../http_samples/mnot/amazon.com.har
> 732 req messages processed
>              compressed | ratio min   max   std
> req  bohe        26,122 | 0.13  0.04  0.70  0.08
> req delta        33,955 | 0.17  0.02  0.71  0.09
> req http1       195,386 | 1.00  1.00  1.00  0.00
> req spdy3        27,238 | 0.14  0.04  0.71  0.08
>
> 732 res messages processed
>              compressed | ratio min   max   std
> res  bohe        39,628 | 0.25  0.04  0.66  0.07
> res delta        44,499 | 0.28  0.02  0.65  0.09
> res http1       159,968 | 1.00  1.00  1.00  0.00
> res spdy3        41,325 | 0.26  0.04  0.67  0.08
>
>
> ./compare_compressors.py -c bohe -c spdy3 -c delta
> ../http_samples/mnot/craigslist.org.har
> 66 req messages processed
>              compressed | ratio min   max   std
> req  bohe         1,948 | 0.15  0.06  0.73  0.11
> req delta         2,036 | 0.16  0.07  0.71  0.11
> req http1        12,894 | 1.00  1.00  1.00  0.00
> req spdy3         2,016 | 0.16  0.07  0.75  0.11
>
> 66 res messages processed
>              compressed | ratio min   max   std
> res  bohe         1,786 | 0.18  0.07  0.77  0.13
> res delta         2,858 | 0.28  0.08  0.69  0.12
> res http1        10,147 | 1.00  1.00  1.00  0.00
> res spdy3         1,869 | 0.18  0.09  0.78  0.13
>
>
> ./compare_compressors.py -c bohe -c spdy3 -c delta
> ../http_samples/mnot/flickr.com.har
> 438 req messages processed
>              compressed | ratio min   max   std
> req  bohe        11,988 | 0.10  0.02  0.69  0.07
> req delta        26,372 | 0.22  0.01  0.71  0.14
> req http1       121,854 | 1.00  1.00  1.00  0.00
> req spdy3        12,550 | 0.10  0.02  0.71  0.07
>
> 438 res messages processed
>              compressed | ratio min   max   std
> res  bohe        13,073 | 0.09  0.05  0.66  0.06
> res delta        25,236 | 0.18  0.02  0.70  0.11
> res http1       140,457 | 1.00  1.00  1.00  0.00
> res spdy3        14,142 | 0.10  0.05  0.66  0.06
>
>
> ./compare_compressors.py -c bohe -c spdy3 -c delta
> ../http_samples/mnot/facebook.com.har
> 234 req messages processed
>              compressed | ratio min   max   std
> req  bohe         6,091 | 0.15  0.06  0.78  0.07
> req delta         7,800 | 0.19  0.02  0.70  0.07
> req http1        41,980 | 1.00  1.00  1.00  0.00
> req spdy3         6,301 | 0.15  0.06  0.77  0.07
>
> 234 res messages processed
>              compressed | ratio min   max   std
> res  bohe         9,458 | 0.23  0.07  0.68  0.07
> res delta        12,045 | 0.30  0.13  0.60  0.08
> res http1        40,252 | 1.00  1.00  1.00  0.00
> res spdy3         9,788 | 0.24  0.07  0.69  0.07
>
>
>
>
>

Received on Tuesday, 15 January 2013 01:30:42 UTC