Speed concern related to bit order in Huffman (was: I-D Action: draft-ietf-httpbis-header-compression-10.txt)

Hi,
I did not follow the list for nearly a year but this aspect as apparently not been discussed.
There’s something that puzzles me when I read the HPACK spec it seems to record the Huffman compressed data in a natural and naive way which is not what state of the art compressors do.
The problem here is not space saving but rather speed and especially decoding speed and it may not be apparent when using high level languages this is rather a fine tune for C and other bare metal (assembly like) implementation.

Bits-at-top is a different approach: Deflate has been doing this for more than two decades and Charles Bloom recently stated that it is still the fastest way to do bit level I/O (Charles Bloom is well respected in the data compression comunity).
http://cbloomrants.blogspot.fr/2014/02/02-18-14-ansfast-implementation-notes.html

If I take the sample given in HPACK 10:
http://tools.ietf.org/html/draft-ietf-httpbis-header-compression-10#appendix-C.4
"example.com" is Huffman encoded this way: f1e3 c2e5 f23a 6ba0 ab90 f4ff

F1E3 C2E5 or 11110001 11100011 1100 0010 11100101 in binary and comes from:
‘w’ (119) 1111000
‘w’ (119) 1111000
‘w’ (119) 1111000
'.' (46)  010111
'e' (101) 00101

Notice that bits have been put into the bytes as they come: 1111000 <- 1111000 <- 1111000 <- 010111 <- 00101, dot used to show code boundary: 1111000.1 111000.11 1100 0.010 111.00101

Bits at top does work this way around:
xxxxxxxx xxxxxxxx xxxxxxxx x1111000
xxxxxxxx xxxxxxxx xx111100 01111000
-> emit 01111000
xxxxxxxx xxxxxxxx xxxxxxxx xx111100
xxxxxxxx xxxxxxxx xxx11110 00111100
-> emit 00111100
xxxxxxxx xxxxxxxx xxxxxxxx xxx11110
xxxxxxxx xxxxxxxx xxxxx010 11111110
-> emit 11111110
xxxxxxxx xxxxxxxx xxxxxxxx xxxxx010
xxxxxxxx xxxxxxxx xxxxxxxx 00101010
-> emit 00101010

gives 01111000 00111100 11111110 00101010 which is impossible to decode… silly me forget to reverse bit order at code level, which gives :
‘w’ (119) 1111000 -> 0001111
‘w’ (119) 1111000 -> 0001111
‘w’ (119) 1111000 -> 0001111
'.' (46)  010111 -> 111010
'e' (101) 00101 -> 10100

xxxxxxxx xxxxxxxx xxxxxxxx x0001111
xxxxxxxx xxxxxxxx xx000111 10001111
-> emit 10001111
xxxxxxxx xxxxxxxx xxxxxxxx xx000111
xxxxxxxx xxxxxxxx xxx00011 11000111
-> emit 11000111
xxxxxxxx xxxxxxxx xxxxxxxx xxx00011
xxxxxxxx xxxxxxxx xxxxx111 01000011
-> emit 01000011
xxxxxxxx xxxxxxxx xxxxxxxx xxxxx111
xxxxxxxx xxxxxxxx xxxxxxxx 10100111
-> emit 10100111

gives 10001111 11000111 01000011 10100111 or 8FC7 43A7, 

Bits-at-top looks a bit more complicated but is much closer to what CPUs are able to handle quickly, Deflate and now Brotli compression work this way and as I wrote before it’s still the way recommended by Charles Bloom for super fast implementation of bit level I/O.

I hope it’s not too late to stop press to evaluate the real impact it could have on HPACK.

Regards
Frédéric Kayser

Received on Tuesday, 2 December 2014 02:11:20 UTC