- From: RUELLAN Herve <Herve.Ruellan@crf.canon.fr>
- Date: Fri, 5 Apr 2013 16:02:43 +0000
- To: Roberto Peon <grmocg@gmail.com>, HTTP Working Group <ietf-http-wg@w3.org>
> -----Original Message----- > From: Roberto Peon [mailto:grmocg@gmail.com] > Sent: vendredi 5 avril 2013 01:56 > To: HTTP Working Group > Subject: Compression analysis of perfect atom-based compressor > [snip] > Take aways? > > * We need a better survey of headers from everywhere :) > > * Compression over our corpus should scale favorably with small table > (and state) size. > > * Encoding index as dist-from-newest really works well, and LRU > appears to be extremely effective as an expiration policy (the attached graph > looks good). For HeaderDiff, we changed the expiration policy on the encoder side to use LRU: we found it was more effective than our previous "smart" algorithm. > * We're getting substantial compression from both key and value > backreferences/tokenization. > > * Algorithmically, there isn't a whole lot to do-- the devil is really in the > serialization details and the tradeoffs involved in generating/parsing. There > are obvious tweaks that compressors could do when space constrained (e.g. > looking at the first table, above, as the likely benefit and making decisions > based upon that), but the data which suggests that the LRU is so effective > also suggests that this benefit is likely limited unless they can predict the > future :) > For information, here is the size of the literal values that have to be transmitted. Requests --------- header name | Encoded value size :path | 1454063 referer | 245761 cookie | 112619 user-agent | 63628 accept | 23975 :host | 22855 accept-language | 12325 accept-charset | 11100 accept-encoding | 9223 if-modified-since | 3915 nt_w3c | 2650 :scheme | 2623 :method | 1961 Responses --------- header name | Encoded value size last-modified | 266750 expires | 199575 date | 155803 etag | 114298 set-cookie | 110527 via | 84709 cache-control | 65506 location | 61699 content-length | 49104 x-amz-cf-id | 43120 x-amz-id-2 | 38960 x-varnish | 32757 p3p | 28416 content-type | 27362 age | 25023 content-disposition | 22147 x-cache | 15376 x-cache-lookup | 13911 server | 13199 x-amz-request-id | 9168 x-fb-debug | 7348 vary | 4688 x-json | 4384 To get any compression on these values we can use: - Deflate (I'm not sure it will get any traction in the group ;-)). - Prefix sharing (we're looking for a way to make it fully secure). - Static Huffman encoding (it adds some computational costs). - Typed codec (it should work well with last-modified, expires, date...). Hervé.
Received on Friday, 5 April 2013 16:03:19 UTC