- From: Frédéric Kayser <f.kayser@free.fr>
- Date: Tue, 2 Dec 2014 03:08:07 +0100
- To: ietf-http-wg@w3.org
Hi, I did not follow the list for nearly a year but this aspect as apparently not been discussed. There’s something that puzzles me when I read the HPACK spec it seems to record the Huffman compressed data in a natural and naive way which is not what state of the art compressors do. The problem here is not space saving but rather speed and especially decoding speed and it may not be apparent when using high level languages this is rather a fine tune for C and other bare metal (assembly like) implementation. Bits-at-top is a different approach: Deflate has been doing this for more than two decades and Charles Bloom recently stated that it is still the fastest way to do bit level I/O (Charles Bloom is well respected in the data compression comunity). http://cbloomrants.blogspot.fr/2014/02/02-18-14-ansfast-implementation-notes.html If I take the sample given in HPACK 10: http://tools.ietf.org/html/draft-ietf-httpbis-header-compression-10#appendix-C.4 "example.com" is Huffman encoded this way: f1e3 c2e5 f23a 6ba0 ab90 f4ff F1E3 C2E5 or 11110001 11100011 1100 0010 11100101 in binary and comes from: ‘w’ (119) 1111000 ‘w’ (119) 1111000 ‘w’ (119) 1111000 '.' (46) 010111 'e' (101) 00101 Notice that bits have been put into the bytes as they come: 1111000 <- 1111000 <- 1111000 <- 010111 <- 00101, dot used to show code boundary: 1111000.1 111000.11 1100 0.010 111.00101 Bits at top does work this way around: xxxxxxxx xxxxxxxx xxxxxxxx x1111000 xxxxxxxx xxxxxxxx xx111100 01111000 -> emit 01111000 xxxxxxxx xxxxxxxx xxxxxxxx xx111100 xxxxxxxx xxxxxxxx xxx11110 00111100 -> emit 00111100 xxxxxxxx xxxxxxxx xxxxxxxx xxx11110 xxxxxxxx xxxxxxxx xxxxx010 11111110 -> emit 11111110 xxxxxxxx xxxxxxxx xxxxxxxx xxxxx010 xxxxxxxx xxxxxxxx xxxxxxxx 00101010 -> emit 00101010 gives 01111000 00111100 11111110 00101010 which is impossible to decode… silly me forget to reverse bit order at code level, which gives : ‘w’ (119) 1111000 -> 0001111 ‘w’ (119) 1111000 -> 0001111 ‘w’ (119) 1111000 -> 0001111 '.' (46) 010111 -> 111010 'e' (101) 00101 -> 10100 xxxxxxxx xxxxxxxx xxxxxxxx x0001111 xxxxxxxx xxxxxxxx xx000111 10001111 -> emit 10001111 xxxxxxxx xxxxxxxx xxxxxxxx xx000111 xxxxxxxx xxxxxxxx xxx00011 11000111 -> emit 11000111 xxxxxxxx xxxxxxxx xxxxxxxx xxx00011 xxxxxxxx xxxxxxxx xxxxx111 01000011 -> emit 01000011 xxxxxxxx xxxxxxxx xxxxxxxx xxxxx111 xxxxxxxx xxxxxxxx xxxxxxxx 10100111 -> emit 10100111 gives 10001111 11000111 01000011 10100111 or 8FC7 43A7, Bits-at-top looks a bit more complicated but is much closer to what CPUs are able to handle quickly, Deflate and now Brotli compression work this way and as I wrote before it’s still the way recommended by Charles Bloom for super fast implementation of bit level I/O. I hope it’s not too late to stop press to evaluate the real impact it could have on HPACK. Regards Frédéric Kayser
Received on Tuesday, 2 December 2014 02:11:20 UTC