W3C home > Mailing lists > Public > ietf-http-wg@w3.org > April to June 2013

RE: Compression analysis of perfect atom-based compressor

From: RUELLAN Herve <Herve.Ruellan@crf.canon.fr>
Date: Fri, 5 Apr 2013 16:02:43 +0000
To: Roberto Peon <grmocg@gmail.com>, HTTP Working Group <ietf-http-wg@w3.org>
Message-ID: <6C71876BDCCD01488E70A2399529D5E5163F79CF@ADELE.crf.canon.fr>


> -----Original Message-----
> From: Roberto Peon [mailto:grmocg@gmail.com]
> Sent: vendredi 5 avril 2013 01:56
> To: HTTP Working Group
> Subject: Compression analysis of perfect atom-based compressor
> 
[snip]

> Take aways?
> 
> *	We need a better survey of headers from everywhere :)
> 
> *	Compression over our corpus should scale favorably with small table
> (and state) size.
> 
> *	Encoding index as dist-from-newest really works well, and LRU
> appears to be extremely effective as an expiration policy (the attached graph
> looks good).

For HeaderDiff, we changed the expiration policy on the encoder side to use LRU: we found it was more effective than our previous "smart" algorithm.

> *	We're getting substantial compression from both key and value
> backreferences/tokenization.
> 
> *	Algorithmically, there isn't a whole lot to do-- the devil is really in the
> serialization details and the tradeoffs involved in generating/parsing. There
> are obvious tweaks that compressors could do when space constrained (e.g.
> looking at the first table, above, as the likely benefit and making decisions
> based upon that), but the data which suggests that the LRU is so effective
> also suggests that this benefit is likely limited unless they can predict the
> future :)
> 

For information, here is the size of the literal values that have to be transmitted.

Requests
---------
header name                              | Encoded value size
:path                                    |   1454063 
referer                                  |    245761 
cookie                                   |    112619 
user-agent                               |     63628 
accept                                   |     23975 
:host                                    |     22855 
accept-language                          |     12325 
accept-charset                           |     11100 
accept-encoding                          |      9223 
if-modified-since                        |      3915 
nt_w3c                                   |      2650 
:scheme                                  |      2623 
:method                                  |      1961

Responses
---------
header name                              | Encoded value size
last-modified                            |    266750 
expires                                  |    199575 
date                                     |    155803 
etag                                     |    114298 
set-cookie                               |    110527 
via                                      |     84709 
cache-control                            |     65506 
location                                 |     61699 
content-length                           |     49104 
x-amz-cf-id                              |     43120 
x-amz-id-2                               |     38960 
x-varnish                                |     32757 
p3p                                      |     28416 
content-type                             |     27362 
age                                      |     25023 
content-disposition                      |     22147 
x-cache                                  |     15376 
x-cache-lookup                           |     13911 
server                                   |     13199 
x-amz-request-id                         |      9168 
x-fb-debug                               |      7348 
vary                                     |      4688 
x-json                                   |      4384 

To get any compression on these values we can use:
- Deflate (I'm not sure it will get any traction in the group ;-)).
- Prefix sharing (we're looking for a way to make it fully secure).
- Static Huffman encoding (it adds some computational costs).
- Typed codec (it should work well with last-modified, expires, date...).

Hervé.
Received on Friday, 5 April 2013 16:03:19 UTC

This archive was generated by hypermail 2.3.1 : Tuesday, 1 March 2016 11:11:12 UTC