- From: RUELLAN Herve <Herve.Ruellan@crf.canon.fr>
- Date: Fri, 5 Dec 2014 10:55:59 +0000
- To: Frédéric Kayser <f.kayser@free.fr>, "ietf-http-wg@w3.org" <ietf-http-wg@w3.org>
I gave a look at your proposal and I think it should provide some speed-up to HPACK encoding when using Huffman codes. Unfortunately, we are now late in the process and I don't think this optimization introduces enough benefits to be included in HPACK. We might consider it again if we had other changes to make to HPACK. Regards, Hervé. > -----Original Message----- > From: Frédéric Kayser [mailto:f.kayser@free.fr] > Sent: mardi 2 décembre 2014 03:08 > To: ietf-http-wg@w3.org > Subject: Speed concern related to bit order in Huffman (was: I-D Action: draft- > ietf-httpbis-header-compression-10.txt) > > Hi, > I did not follow the list for nearly a year but this aspect as apparently not been > discussed. > There's something that puzzles me when I read the HPACK spec it seems to > record the Huffman compressed data in a natural and naive way which is not > what state of the art compressors do. > The problem here is not space saving but rather speed and especially decoding > speed and it may not be apparent when using high level languages this is rather > a fine tune for C and other bare metal (assembly like) implementation. > > Bits-at-top is a different approach: Deflate has been doing this for more than > two decades and Charles Bloom recently stated that it is still the fastest way to > do bit level I/O (Charles Bloom is well respected in the data compression > comunity). > http://cbloomrants.blogspot.fr/2014/02/02-18-14-ansfast-implementation- > notes.html > > If I take the sample given in HPACK 10: > http://tools.ietf.org/html/draft-ietf-httpbis-header-compression-10#appendix- > C.4 > "example.com" is Huffman encoded this way: f1e3 c2e5 f23a 6ba0 ab90 f4ff > > F1E3 C2E5 or 11110001 11100011 1100 0010 11100101 in binary and comes > from: > 'w' (119) 1111000 > 'w' (119) 1111000 > 'w' (119) 1111000 > '.' (46) 010111 > 'e' (101) 00101 > > Notice that bits have been put into the bytes as they come: 1111000 <- 1111000 > <- 1111000 <- 010111 <- 00101, dot used to show code boundary: 1111000.1 > 111000.11 1100 0.010 111.00101 > > Bits at top does work this way around: > xxxxxxxx xxxxxxxx xxxxxxxx x1111000 > xxxxxxxx xxxxxxxx xx111100 01111000 > -> emit 01111000 > xxxxxxxx xxxxxxxx xxxxxxxx xx111100 > xxxxxxxx xxxxxxxx xxx11110 00111100 > -> emit 00111100 > xxxxxxxx xxxxxxxx xxxxxxxx xxx11110 > xxxxxxxx xxxxxxxx xxxxx010 11111110 > -> emit 11111110 > xxxxxxxx xxxxxxxx xxxxxxxx xxxxx010 > xxxxxxxx xxxxxxxx xxxxxxxx 00101010 > -> emit 00101010 > > gives 01111000 00111100 11111110 00101010 which is impossible to decode. > silly me forget to reverse bit order at code level, which gives : > 'w' (119) 1111000 -> 0001111 > 'w' (119) 1111000 -> 0001111 > 'w' (119) 1111000 -> 0001111 > '.' (46) 010111 -> 111010 > 'e' (101) 00101 -> 10100 > > xxxxxxxx xxxxxxxx xxxxxxxx x0001111 > xxxxxxxx xxxxxxxx xx000111 10001111 > -> emit 10001111 > xxxxxxxx xxxxxxxx xxxxxxxx xx000111 > xxxxxxxx xxxxxxxx xxx00011 11000111 > -> emit 11000111 > xxxxxxxx xxxxxxxx xxxxxxxx xxx00011 > xxxxxxxx xxxxxxxx xxxxx111 01000011 > -> emit 01000011 > xxxxxxxx xxxxxxxx xxxxxxxx xxxxx111 > xxxxxxxx xxxxxxxx xxxxxxxx 10100111 > -> emit 10100111 > > gives 10001111 11000111 01000011 10100111 or 8FC7 43A7, > > Bits-at-top looks a bit more complicated but is much closer to what CPUs are > able to handle quickly, Deflate and now Brotli compression work this way and > as I wrote before it's still the way recommended by Charles Bloom for super > fast implementation of bit level I/O. > > I hope it's not too late to stop press to evaluate the real impact it could have on > HPACK. > > Regards > Frédéric Kayser >
Received on Friday, 5 December 2014 10:56:29 UTC