Re: delta-encoding compressor code from Roberto Peon on 2012-11-30 (ietf-http-wg@w3.org from October to December 2012)

From: Roberto Peon <grmocg@gmail.com>
Date: Fri, 30 Nov 2012 13:03:18 -0800
To: HTTP Working Group <ietf-http-wg@w3.org>
Message-ID: <CAP+FsNfSkehELu0f9xnjKNMefwgE4dE_QK=ZtMW7yx-ZyK8O=g@mail.gmail.com>

The ratios are correct, but 'uncompressed' is labeled as 'compressed' and
vice versa. *sigh* I'll fix it with the next commit.
-=R


On Thu, Nov 29, 2012 at 2:51 PM, Roberto Peon <grmocg@gmail.com> wrote:

> I've rewritten the python version, which now does encoding/decoding of the
> new format, which is smaller than the old version
> I've also cleaned up the code a fair bit and added some documentation for
> the more important bits.
>
> As usual, the code is available here:
>   https://github.com/grmocg/SPDY-Specification/tree/gh-pages
>
> An example output over my dataset, generated by doing ./headers_sample.py
> -v 0 ../test-data/*.har
>    is this:
>                                        http1   |   spdy3   |   spdy4
> Req                Compressed Sums:    830525  |   944453  |   106185
> Req              Uncompressed Sums:     67237  |    87886  |   106185
> Rsp                Compressed Sums:    508189  |   627505  |   152962
> Rsp              Uncompressed Sums:    105626  |   128226  |   152962
> Req   Compressed/uncompressed HTTP:   0.08096  |  0.10582  |  0.12785
> Rsp   Compressed/uncompressed HTTP:   0.20785  |  0.25232  |  0.30099
>
> As a reminder compressing HTTP/1.X or SPDY3 (raw) with gzip isn't safe,
> and is included only for comparison/reference.
>
> There are a few parameters in headers_codec.py which may be interesting to
> play with (and the majority of the TODOs indicating my thoughts on future
> research direction/work).
> In particular, look for:
> string_length_field_bitlen, strings_use_eof, strings_padded_to_byte_boundary,
> and strings_use_huffman
>
> One thing which is a TODO here is that I haven't ensured that the
> first-bit of any huffman-encoded string is 1 (which is something I did
> intend to have in there, but isn't super critical right now), which is one
> possible way of indicating that a string is not huffman-encoded.
>
> As a reminder, though the python version is not made for performance, the
> compressor here is made with performance in mind, and includes features to
> attempt to ensure that high-throughput proxies can operate efficiently, and
> as a result we do trade-off some compression.
>
> The c++ version of this (which only currently does compression) is useful
> for determining approximate speed (~3X faster than gzip when doing
> compression, which proxies should be able to avoid)
> -=R
>

Received on Friday, 30 November 2012 21:03:49 UTC