- From: Willy Tarreau <w@1wt.eu>
- Date: Fri, 24 Oct 2014 20:56:08 +0200
- To: Mark Nottingham <mnot@mnot.net>
- Cc: HTTP Working Group <ietf-http-wg@w3.org>
Hi Mark, On Fri, Oct 24, 2014 at 09:33:12PM +1100, Mark Nottingham wrote: > Toy up at: > https://gist.github.com/mnot/434ab029a6e878b2af4c Thank you, I could use it. I noticed that the random names it produces can sometimes be used as a custom fixed header, sometimes as a custom random header. It's no big deal, but I think it does not accurately model reality since we'd rather have some fixed headers (eg: customer name) and some always random ones (eg: signature, timestamp). I also thought that we could have a few partially random values (those who change from time to time such as x-forwarded-for behind a proxy), but I don't think it will change things a lot anyway. So in turn I have worked today :-) I implemented a simple encoder which parses your program's output and emits statistics on the output data. It does not emit the output bytes, it just performs the encoding and counts. It's almost nothing to add, it is just that I had no use for the output. It supports 4 encodings : - draft 09 - the proposal I sent that was called "option 3" - the proposed revision I sent just after it - Greg's proposed revision It reports various statistics such as number of strings encoded, number of integers encoded, average integer size etc... I have run some tests all on the same output from your program, and got interesting findings already : Draft-09 : Total input bytes : 7455384 Total output bytes : 2318395 (100%) Overall compression ratio : 0.310969 (100%) Total encoded integers: 218865 Total encoded integers bytes: 295036 (100%) Avg bytes per integers: 1.348027 (100%) option3 : Total input bytes : 7455384 Total output bytes : 2268350 (97.84%) Overall compression ratio : 0.304257 (97.84%) Total encoded integers: 218865 Total encoded integers bytes: 244991 (83.03%) Avg bytes per integers: 1.119370 (83.03%) revised option3 : Total input bytes : 7455384 Total output bytes : 2264722 (97.68%) Overall compression ratio : 0.303770 (97.68%) Total encoded integers: 218865 Total encoded integers bytes: 241363 (81.81%) Avg bytes per integers: 1.102794 (81.81%) Greg's revision : Total input bytes : 7455384 Total output bytes : 2280713 (98.37%) Overall compression ratio : 0.305915 (98.37%) Total encoded integers: 218865 Total encoded integers bytes: 257354 (87.23%) Avg bytes per integers: 1.175857 (87.23%) First, the overall compression ratio is never exceptional given that the input contains a significant amount of random data, so that's expected. Second, we observe that the integer encoding is 17-18% smaller compared to draft-09. And if we consider the integer encoding's overhead, then it is even divided by 3.4 (0.34 byte to 0.10 byte per integer). The overall savings are 2.1% for "option 3", 2.3% for its revision, and 1.7% for Greg's proposal. To my initial surprise, Greg's proposal provides less savings here despite being balanced. But in the end there's a reason, it offers more bits to literals while it's the case where we already have to pay for the literal overhead so the occasional saving of 1-byte doesn't save much. I have experimented with an option in the code to write fully random headers as literal-without-indexing (as the producer would do, but not a gateway which doesn't know which ones are stable and which ones are not). And while doing so improves the compression ratio, the offset from draft-09 and the other ones does not change. I have not yet tried to modify your program to vary the output between a browser (less custom) or a partner site (more custom). But I wanted to share these results already as I think they can be helpful. All the code is available here : https://github.com/wtarreau/http2-exp The readme is ugly when parsed as md, I've never written md docs so it seems I'm lacking some basic practice here. But I'm sure nobody will care, reading it in the console or as raw is OK. Ah, there's also a debug mode which indicates what encoding is chosen for each field and how long the resulting sequence is. It helped me debug it, and I found it useful to understand how the table evolves. Comments welcome. It's my first HPACK encoder, it's very possible that I messed up a lot with certain things, though I didn't notice that. In any case, feel free to comment/fork/fix/etc. Best regards, Willy
Received on Friday, 24 October 2014 18:56:34 UTC