Re: #578: getting real-ish numbers for option 3 from Mark Nottingham on 2014-10-27 (ietf-http-wg@w3.org from October to December 2014)

From: Mark Nottingham <mnot@mnot.net>
Date: Mon, 27 Oct 2014 13:48:31 -0700
To: Willy Tarreau <w@1wt.eu>, HTTP Working Group <ietf-http-wg@w3.org>
Message-Id: <430B8997-74E6-40B9-8239-899D6DDF3831@mnot.net>
Thanks for that, Willy.

Asking the question directly — of the people who -1’d making changes for #578, have these numbers changed your mind?

Cheers,


> On 24 Oct 2014, at 11:56 am, Willy Tarreau <w@1wt.eu> wrote:
> 
> Hi Mark,
> 
> On Fri, Oct 24, 2014 at 09:33:12PM +1100, Mark Nottingham wrote:
>> Toy up at:
>>  https://gist.github.com/mnot/434ab029a6e878b2af4c
> 
> Thank you, I could use it. I noticed that the random names it produces
> can sometimes be used as a custom fixed header, sometimes as a custom
> random header. It's no big deal, but I think it does not accurately
> model reality since we'd rather have some fixed headers (eg: customer
> name) and some always random ones (eg: signature, timestamp). I also
> thought that we could have a few partially random values (those who
> change from time to time such as x-forwarded-for behind a proxy), but
> I don't think it will change things a lot anyway.
> 
> So in turn I have worked today :-)
> 
> I implemented a simple encoder which parses your program's output and
> emits statistics on the output data. It does not emit the output bytes,
> it just performs the encoding and counts. It's almost nothing to add,
> it is just that I had no use for the output.
> 
> It supports 4 encodings :
>  - draft 09
>  - the proposal I sent that was called "option 3"
>  - the proposed revision I sent just after it
>  - Greg's proposed revision
> 
> It reports various statistics such as number of strings encoded, number
> of integers encoded, average integer size etc... I have run some tests
> all on the same output from your program, and got interesting findings
> already :
> 
> Draft-09 :
>        Total input bytes : 7455384
>        Total output bytes : 2318395            (100%)
>        Overall compression ratio : 0.310969    (100%)
>        Total encoded integers: 218865
>        Total encoded integers bytes: 295036    (100%)
>        Avg bytes per integers: 1.348027        (100%)
> 
> option3 :
>        Total input bytes : 7455384
>        Total output bytes : 2268350            (97.84%)
>        Overall compression ratio : 0.304257    (97.84%)
>        Total encoded integers: 218865
>        Total encoded integers bytes: 244991    (83.03%)
>        Avg bytes per integers: 1.119370        (83.03%)
> 
> revised option3 :
>        Total input bytes : 7455384
>        Total output bytes : 2264722            (97.68%)
>        Overall compression ratio : 0.303770    (97.68%)
>        Total encoded integers: 218865
>        Total encoded integers bytes: 241363    (81.81%)
>        Avg bytes per integers: 1.102794        (81.81%)
> 
> Greg's revision :
>        Total input bytes : 7455384
>        Total output bytes : 2280713            (98.37%)
>        Overall compression ratio : 0.305915    (98.37%)
>        Total encoded integers: 218865
>        Total encoded integers bytes: 257354    (87.23%)
>        Avg bytes per integers: 1.175857        (87.23%)
> 
> First, the overall compression ratio is never exceptional given that the
> input contains a significant amount of random data, so that's expected.
> Second, we observe that the integer encoding is 17-18% smaller compared
> to draft-09. And if we consider the integer encoding's overhead, then it
> is even divided by 3.4 (0.34 byte to 0.10 byte per integer).
> 
> The overall savings are 2.1% for "option 3", 2.3% for its revision, and
> 1.7% for Greg's proposal. To my initial surprise, Greg's proposal provides
> less savings here despite being balanced. But in the end there's a reason,
> it offers more bits to literals while it's the case where we already have
> to pay for the literal overhead so the occasional saving of 1-byte doesn't
> save much.
> 
> I have experimented with an option in the code to write fully random headers
> as literal-without-indexing (as the producer would do, but not a gateway
> which doesn't know which ones are stable and which ones are not). And while
> doing so improves the compression ratio, the offset from draft-09 and the
> other ones does not change.
> 
> I have not yet tried to modify your program to vary the output between a
> browser (less custom) or a partner site (more custom). But I wanted to share
> these results already as I think they can be helpful.
> 
> All the code is available here :
> 
>  https://github.com/wtarreau/http2-exp
> 
> The readme is ugly when parsed as md, I've never written md docs so it
> seems I'm lacking some basic practice here. But I'm sure nobody will
> care, reading it in the console or as raw is OK.
> 
> Ah, there's also a debug mode which indicates what encoding is chosen for
> each field and how long the resulting sequence is. It helped me debug it,
> and I found it useful to understand how the table evolves.
> 
> Comments welcome. It's my first HPACK encoder, it's very possible that I
> messed up a lot with certain things, though I didn't notice that. In any
> case, feel free to comment/fork/fix/etc.
> 
> Best regards,
> Willy
> 

--
Mark Nottingham   http://www.mnot.net/
Received on Monday, 27 October 2014 20:48:57 UTC