- From: Roberto Peon <grmocg@gmail.com>
- Date: Mon, 1 Dec 2014 18:21:32 -0800
- To: Frédéric Kayser <f.kayser@free.fr>
- Cc: HTTP Working Group <ietf-http-wg@w3.org>
- Message-ID: <CAP+FsNemovm8Huy8-0x+5Zim2S3pQ7ihk0kb2C8u+=CAF+Bofg@mail.gmail.com>
The best way to influence is to show the difference :) I remember bemoaning that little-endian won, which requires such things for speed for bitwise operations. -=R On Mon, Dec 1, 2014 at 6:08 PM, Frédéric Kayser <f.kayser@free.fr> wrote: > Hi, > I did not follow the list for nearly a year but this aspect as apparently > not been discussed. > There’s something that puzzles me when I read the HPACK spec it seems to > record the Huffman compressed data in a natural and naive way which is not > what state of the art compressors do. > The problem here is not space saving but rather speed and especially > decoding speed and it may not be apparent when using high level languages > this is rather a fine tune for C and other bare metal (assembly like) > implementation. > > Bits-at-top is a different approach: Deflate has been doing this for more > than two decades and Charles Bloom recently stated that it is still the > fastest way to do bit level I/O (Charles Bloom is well respected in the > data compression comunity). > > http://cbloomrants.blogspot.fr/2014/02/02-18-14-ansfast-implementation-notes.html > > If I take the sample given in HPACK 10: > > http://tools.ietf.org/html/draft-ietf-httpbis-header-compression-10#appendix-C.4 > "example.com" is Huffman encoded this way: f1e3 c2e5 f23a 6ba0 ab90 f4ff > > F1E3 C2E5 or 11110001 11100011 1100 0010 11100101 in binary and comes from: > ‘w’ (119) 1111000 > ‘w’ (119) 1111000 > ‘w’ (119) 1111000 > '.' (46) 010111 > 'e' (101) 00101 > > Notice that bits have been put into the bytes as they come: 1111000 <- > 1111000 <- 1111000 <- 010111 <- 00101, dot used to show code boundary: > 1111000.1 111000.11 1100 0.010 111.00101 > > Bits at top does work this way around: > xxxxxxxx xxxxxxxx xxxxxxxx x1111000 > xxxxxxxx xxxxxxxx xx111100 01111000 > -> emit 01111000 > xxxxxxxx xxxxxxxx xxxxxxxx xx111100 > xxxxxxxx xxxxxxxx xxx11110 00111100 > -> emit 00111100 > xxxxxxxx xxxxxxxx xxxxxxxx xxx11110 > xxxxxxxx xxxxxxxx xxxxx010 11111110 > -> emit 11111110 > xxxxxxxx xxxxxxxx xxxxxxxx xxxxx010 > xxxxxxxx xxxxxxxx xxxxxxxx 00101010 > -> emit 00101010 > > gives 01111000 00111100 11111110 00101010 which is impossible to decode… > silly me forget to reverse bit order at code level, which gives : > ‘w’ (119) 1111000 -> 0001111 > ‘w’ (119) 1111000 -> 0001111 > ‘w’ (119) 1111000 -> 0001111 > '.' (46) 010111 -> 111010 > 'e' (101) 00101 -> 10100 > > xxxxxxxx xxxxxxxx xxxxxxxx x0001111 > xxxxxxxx xxxxxxxx xx000111 10001111 > -> emit 10001111 > xxxxxxxx xxxxxxxx xxxxxxxx xx000111 > xxxxxxxx xxxxxxxx xxx00011 11000111 > -> emit 11000111 > xxxxxxxx xxxxxxxx xxxxxxxx xxx00011 > xxxxxxxx xxxxxxxx xxxxx111 01000011 > -> emit 01000011 > xxxxxxxx xxxxxxxx xxxxxxxx xxxxx111 > xxxxxxxx xxxxxxxx xxxxxxxx 10100111 > -> emit 10100111 > > gives 10001111 11000111 01000011 10100111 or 8FC7 43A7, > > Bits-at-top looks a bit more complicated but is much closer to what CPUs > are able to handle quickly, Deflate and now Brotli compression work this > way and as I wrote before it’s still the way recommended by Charles Bloom > for super fast implementation of bit level I/O. > > I hope it’s not too late to stop press to evaluate the real impact it > could have on HPACK. > > Regards > Frédéric Kayser > > >
Received on Tuesday, 2 December 2014 02:21:59 UTC