RE: Choosing a header compression algorithm from RUELLAN Herve on 2013-03-21 (ietf-http-wg@w3.org from January to March 2013)

From: RUELLAN Herve <Herve.Ruellan@crf.canon.fr>
Date: Thu, 21 Mar 2013 11:35:14 +0000
To: Mark Nottingham <mnot@mnot.net>, "ietf-http-wg@w3.org Group" <ietf-http-wg@w3.org>
Message-ID: <6C71876BDCCD01488E70A2399529D5E5163F393F@ADELE.crf.canon.fr>
To help everyone better understand the two proposals, I tried to make a comparison of the different mechanisms they use.

* Buffers:
Delta has a static pre-filled (name, value) buffer and a dynamic (name, value) buffer. The size of the dynamic buffer can be limited.
HeaderDiff has a pre-filled name buffer (which is extendable to include new header names) and a dynamic (name, value) buffer. The size of the dynamic buffer can be limited.

* Buffer management for additions:
Delta: the encoder decides whether a new entry is added to the buffer or not.
HeaderDiff: the encoder decides whether a new entry is added to the buffer or not.

* Buffer management for removals:
Delta: the buffer uses a LRU mechanisms to find which entry is removed from the buffer to fulfill the size limitation when a new entry is added to the buffer.
HeaderDiff: the removals of entries from the buffer are at encoder discretion, as long as size limitation are fulfilled (the algorithm used in the implementation is loosely based on LRU).

* Buffer management synchronization between Encoder and Decoder:
Delta: the same algorithm is used on both sides.
HeaderDiff: all buffer updates are transmitted on the wire.

* Default for a new header set:
Delta: the default for a new header set is the previous header set. A header can be removed from the set by encoding it as an index. Delta supports up to 256 different header sets.
HeaderDiff: the default for a new header set is the empty set.

* Encoding possibilities:
Delta supports the following encoding possibilities:
- (Name, Value) index;
- (Name, Value) index list;
- Name index and literal Value;
- Literal Name and literal Value.
HeaderDiff supports the following encoding possibilities:
- (Name, Value) index;
- Name index and literal Value;
- Literal Name and literal Value;
- delta-encoding: (Name, Value) index, with a shared prefix and a new suffix for the Value.

* Encoding choice representation:
Delta uses opcodes, that are encoded as the opcode identifier (1 byte) and the number of encoded headers (1 byte).
HeaderDiff uses a few bits (2-4) in the encoding of each header.
Both proposals are byte-aligned at the header level.

* String encoding:
Delta uses a static Huffman encoding.
HeaderDiff uses a raw literal encoding.
Both proposals could easily support several types of encoding: raw literal, static Huffman, typed codecs...


I hoped I capture correctly the differences and similarities between the two proposals. 
Roberto, if you see any misunderstanding of Delta, you're welcome to correct it.

I'm going to details the reasons of our design choices in further e-mails.

Regards,

Hervé.


> -----Original Message-----
> From: Mark Nottingham [mailto:mnot@mnot.net]
> Sent: jeudi 21 mars 2013 08:11
> To: ietf-http-wg@w3.org Group
> Subject: Choosing a header compression algorithm
> 
> Previously, we've talked about starting with just a delta-encoding approach
> for our first implementation draft. In Orlando, we focused primarily on two
> proposals:
> 
> * Delta2
>   Draft: http://tools.ietf.org/html/draft-rpeon-httpbis-header-compression-
> 03
>   Python Implementation: https://github.com/http2/compression-
> test/tree/master/compressor/delta2
> 
> * HeaderDiff
>   Draft: http://tools.ietf.org/html/draft-ruellan-headerdiff-00
>   Python Implementation: https://github.com/http2/compression-
> test/tree/master/compressor/headerdiff
> 
> As I understand it, Herve et al want to work on making HeaderDiff more
> resistant to CRIME, and hopefully we'll see the results of that in the very near
> future.
> 
> In the meantime, I'd like everyone to become familiar with both drafts and
> the characteristics of their implementations, so that we can have an
> informed discussion of them.
> 
> I'd like to see a few things happen while we do this:
> 
> 1) We need to do apples-to-apples comparison of these compressors to see
> how they behave under a range of constraints (especially, memory).
> 
> 2) I'd like us to verify that they are respecting those constraints, and that
> they're implemented in an equivalent way (this is likely to be manual).
> 
> 3) It would be very good to have a test suite that verifies that they correctly
> reconstitute the semantically significant parts of the headers; in particular,
> large/unusual values, ordering where appropriate, etc. Our current header
> corpus undoubtedly has holes in this regard.
> 
> If you make any progress along these lines (dare I ask for volunteers?), pleas
> share with the list.
> 
> Looking at our issues list, this is one of the major items preventing us from
> getting to a first implementation draft, so I'd like to chose a way forward
> soon -- especially since we're choosing a starting point, and the approach we
> take can evolve or, if necessary, be replaced.
> 
> Regards,
> 
> --
> Mark Nottingham   http://www.mnot.net/
> 
> 
>
Received on Thursday, 21 March 2013 11:35:55 UTC