RE: #445: Transfer-codings

On Saturday,05 April 2014 21:19, Roland Zinks wrote:

> Doing it on each data frame should decrease what you can save with compression.

> At the same time it doesn't require a state between frames ...



Regarding the compression factor, we did a simple investigation to get a feel for how big of a hit you take by using an independent compression context on each frame.



The results surprised us.  The increase from doing frame-by-frame compression versus single-stream compression is smaller than we expected..



For our experiments we used gzip with the following parameters: compression-level=6 (default), windows-bits=14, mem-level=7, strategy=default



Here are the results...

[units are bytes; value in () is the compression factor; % increase is frame-by-frame vs single-stream]



'Counters2.htm'

original file size:    1219829

gzip(single-stream):     17804 (0.015)

gzip(frame-by-frame):    18148 (0.015) 1.9% increase



'http2-spec.htm'

original file size:     291065

gzip(single-stream):     64165 (0.220)

gzip(frame-by-frame):    67048 (0.230) 4.5% increase



'rs.js'

original file size:     145275

gzip(single-stream):     47409 (0.326)

gzip(frame-by-frame):    48877 (0.336) 3.1% increase



'search.htm'

original file size:     236319

gzip(single-stream):     65850 (0.279)

gzip(frame-by-frame):    68353 (0.289) 3.8% increase



'search.json'

original file size:      92680

gzip(single-stream):     16110 (0.174)

gzip(frame-by-frame):    16807 (0.181) 4.3% increase



'url.htm'

original file size:       1082

gzip(single-stream):       523 (0.483)

gzip(frame-by-frame):      523 (0.483) 0.0% increase





Other points...



0. Packing frames with gzip data

An important point is trying to fill each frame with as much compressed data as possible (going over the 16383 byte limit for a frame would obviously create a corrupted stream). The implementation is simple.



1. Client decompression simplicity

We agree that doing frame-by-frame compression results in a *much simpler* logic for the client frame reader. For a single connection, the client needs to maintain only a single inflate context.



2. Deflate instead of Gzip

Using deflate instead of gzip would obviously improve the compression factor slightly. Concatenating of raw deflate streams is also well-defined.



3. Indexing

As Keith mentioned in an earlier e-mail, the compressed DATA frames could be augmented with the uncompressed offset and length for easier indexing.



4. Intermediary Efficiency

Intermediaries could store the DATA frames sequentially e.g. in a single file, and easily serve up the data (or even contiguous subsets) without recompressing.



5. 15-bit vs 14-bit frame size

The compression factor using window-bits=15 would result in a better compression factor, but would obviously require increasing the frame length to 2^15 bytes. (Probably better used than having 2 reserved bits forever.)



6. Compression for ranges

As we've mentioned many times before, using a transfer coding is the only way to compress a range response of an Identity C-E data.



Regards

Chris & Keith



This email message is intended only for the use of the named recipient. Information contained in this email message and its attachments may be privileged, confidential and protected from disclosure. If you are not the intended recipient, please do not read, copy, use or disclose this communication to others. Also please notify the sender by replying to this message and then delete it from your system.

Received on Thursday, 10 April 2014 18:02:27 UTC