Re: #578: getting real-ish numbers for option 3

I'll chime in. Short answer - no, these have not changed my mind at all.
If anything, they've reinforced my view.

These numbers are, once again, along the same lines of other changes
we've already rejected as a working group - relatively miniscule
improvements that are not worth invalidating testing we have already
done. If this change improved the compression ratio significantly then
we might have something worth investigating in another implementation
draft. These numbers don't indicate any significant improvement in
compression ratio, and so don't warrant any further investigation.

On Mon, Oct 27, 2014, at 13:48, Mark Nottingham wrote:
> Thanks for that, Willy.
> 
> Asking the question directly — of the people who -1’d making changes for
> #578, have these numbers changed your mind?
> 
> Cheers,
> 
> 
> > On 24 Oct 2014, at 11:56 am, Willy Tarreau <w@1wt.eu> wrote:
> > 
> > Hi Mark,
> > 
> > On Fri, Oct 24, 2014 at 09:33:12PM +1100, Mark Nottingham wrote:
> >> Toy up at:
> >>  https://gist.github.com/mnot/434ab029a6e878b2af4c
> > 
> > Thank you, I could use it. I noticed that the random names it produces
> > can sometimes be used as a custom fixed header, sometimes as a custom
> > random header. It's no big deal, but I think it does not accurately
> > model reality since we'd rather have some fixed headers (eg: customer
> > name) and some always random ones (eg: signature, timestamp). I also
> > thought that we could have a few partially random values (those who
> > change from time to time such as x-forwarded-for behind a proxy), but
> > I don't think it will change things a lot anyway.
> > 
> > So in turn I have worked today :-)
> > 
> > I implemented a simple encoder which parses your program's output and
> > emits statistics on the output data. It does not emit the output bytes,
> > it just performs the encoding and counts. It's almost nothing to add,
> > it is just that I had no use for the output.
> > 
> > It supports 4 encodings :
> >  - draft 09
> >  - the proposal I sent that was called "option 3"
> >  - the proposed revision I sent just after it
> >  - Greg's proposed revision
> > 
> > It reports various statistics such as number of strings encoded, number
> > of integers encoded, average integer size etc... I have run some tests
> > all on the same output from your program, and got interesting findings
> > already :
> > 
> > Draft-09 :
> >        Total input bytes : 7455384
> >        Total output bytes : 2318395            (100%)
> >        Overall compression ratio : 0.310969    (100%)
> >        Total encoded integers: 218865
> >        Total encoded integers bytes: 295036    (100%)
> >        Avg bytes per integers: 1.348027        (100%)
> > 
> > option3 :
> >        Total input bytes : 7455384
> >        Total output bytes : 2268350            (97.84%)
> >        Overall compression ratio : 0.304257    (97.84%)
> >        Total encoded integers: 218865
> >        Total encoded integers bytes: 244991    (83.03%)
> >        Avg bytes per integers: 1.119370        (83.03%)
> > 
> > revised option3 :
> >        Total input bytes : 7455384
> >        Total output bytes : 2264722            (97.68%)
> >        Overall compression ratio : 0.303770    (97.68%)
> >        Total encoded integers: 218865
> >        Total encoded integers bytes: 241363    (81.81%)
> >        Avg bytes per integers: 1.102794        (81.81%)
> > 
> > Greg's revision :
> >        Total input bytes : 7455384
> >        Total output bytes : 2280713            (98.37%)
> >        Overall compression ratio : 0.305915    (98.37%)
> >        Total encoded integers: 218865
> >        Total encoded integers bytes: 257354    (87.23%)
> >        Avg bytes per integers: 1.175857        (87.23%)
> > 
> > First, the overall compression ratio is never exceptional given that the
> > input contains a significant amount of random data, so that's expected.
> > Second, we observe that the integer encoding is 17-18% smaller compared
> > to draft-09. And if we consider the integer encoding's overhead, then it
> > is even divided by 3.4 (0.34 byte to 0.10 byte per integer).
> > 
> > The overall savings are 2.1% for "option 3", 2.3% for its revision, and
> > 1.7% for Greg's proposal. To my initial surprise, Greg's proposal provides
> > less savings here despite being balanced. But in the end there's a reason,
> > it offers more bits to literals while it's the case where we already have
> > to pay for the literal overhead so the occasional saving of 1-byte doesn't
> > save much.
> > 
> > I have experimented with an option in the code to write fully random headers
> > as literal-without-indexing (as the producer would do, but not a gateway
> > which doesn't know which ones are stable and which ones are not). And while
> > doing so improves the compression ratio, the offset from draft-09 and the
> > other ones does not change.
> > 
> > I have not yet tried to modify your program to vary the output between a
> > browser (less custom) or a partner site (more custom). But I wanted to share
> > these results already as I think they can be helpful.
> > 
> > All the code is available here :
> > 
> >  https://github.com/wtarreau/http2-exp
> > 
> > The readme is ugly when parsed as md, I've never written md docs so it
> > seems I'm lacking some basic practice here. But I'm sure nobody will
> > care, reading it in the console or as raw is OK.
> > 
> > Ah, there's also a debug mode which indicates what encoding is chosen for
> > each field and how long the resulting sequence is. It helped me debug it,
> > and I found it useful to understand how the table evolves.
> > 
> > Comments welcome. It's my first HPACK encoder, it's very possible that I
> > messed up a lot with certain things, though I didn't notice that. In any
> > case, feel free to comment/fork/fix/etc.
> > 
> > Best regards,
> > Willy
> > 
> 
> --
> Mark Nottingham   http://www.mnot.net/
> 
> 
> 
> 


-- 
Peace,
  -Nick

Received on Tuesday, 28 October 2014 00:05:26 UTC