- From: Roberto Peon <grmocg@gmail.com>
- Date: Tue, 2 Dec 2014 16:27:36 -0800
- To: Frédéric Kayser <f.kayser@free.fr>
- Cc: HTTP Working Group <ietf-http-wg@w3.org>
- Message-ID: <CAP+FsNcFDx3Wa04ST==yctHt+vy4H=KjY+623p2E+i1gNRWg1Q@mail.gmail.com>
I'd suggest ignoring it for the purpose of your speed tests :) -=R On Tue, Dec 2, 2014 at 4:10 PM, Frédéric Kayser <f.kayser@free.fr> wrote: > I’ve started the work to benchmark both bit ordering, but now I find that > this EOS thing is a bit too complicated. > > "A Huffman encoded string literal containing the EOS symbol MUST be treated as a decoding error." > > > If you don’t want it to appear in the compressed stream just get rid of it > and free its code (this will make another symbol one bit shorter) > Basically you want to be sure that the unset bits of the last octet cannot > be mismatched with one of the shortest codes (those between 5 and 7 bits) > padding it with bits set to 1 is sufficient, successful decoding is > achieved when either there are no leftover bits or they are all set to 1 > and there's on need to check if EOS has been accidentally decoded since it > does not exist anymore. The size of the compressed stream already holds a > lot of information there’s no need for a full featured EOS in this case, > just get rid of it. > > Regards > > Le 2 déc. 2014 à 03:21, Roberto Peon <grmocg@gmail.com> a écrit : > > The best way to influence is to show the difference :) > I remember bemoaning that little-endian won, which requires such things > for speed for bitwise operations. > > -=R > > On Mon, Dec 1, 2014 at 6:08 PM, Frédéric Kayser <f.kayser@free.fr> wrote: > >> Hi, >> I did not follow the list for nearly a year but this aspect as apparently >> not been discussed. >> There’s something that puzzles me when I read the HPACK spec it seems to >> record the Huffman compressed data in a natural and naive way which is not >> what state of the art compressors do. >> The problem here is not space saving but rather speed and especially >> decoding speed and it may not be apparent when using high level languages >> this is rather a fine tune for C and other bare metal (assembly like) >> implementation. >> >> Bits-at-top is a different approach: Deflate has been doing this for more >> than two decades and Charles Bloom recently stated that it is still the >> fastest way to do bit level I/O (Charles Bloom is well respected in the >> data compression comunity). >> >> http://cbloomrants.blogspot.fr/2014/02/02-18-14-ansfast-implementation-notes.html >> >> If I take the sample given in HPACK 10: >> >> http://tools.ietf.org/html/draft-ietf-httpbis-header-compression-10#appendix-C.4 >> "example.com" is Huffman encoded this way: f1e3 c2e5 f23a 6ba0 ab90 f4ff >> >> F1E3 C2E5 or 11110001 11100011 1100 0010 11100101 in binary and comes >> from: >> ‘w’ (119) 1111000 >> ‘w’ (119) 1111000 >> ‘w’ (119) 1111000 >> '.' (46) 010111 >> 'e' (101) 00101 >> >> Notice that bits have been put into the bytes as they come: 1111000 <- >> 1111000 <- 1111000 <- 010111 <- 00101, dot used to show code boundary: >> 1111000.1 111000.11 1100 0.010 111.00101 >> >> Bits at top does work this way around: >> xxxxxxxx xxxxxxxx xxxxxxxx x1111000 >> xxxxxxxx xxxxxxxx xx111100 01111000 >> -> emit 01111000 >> xxxxxxxx xxxxxxxx xxxxxxxx xx111100 >> xxxxxxxx xxxxxxxx xxx11110 00111100 >> -> emit 00111100 >> xxxxxxxx xxxxxxxx xxxxxxxx xxx11110 >> xxxxxxxx xxxxxxxx xxxxx010 11111110 >> -> emit 11111110 >> xxxxxxxx xxxxxxxx xxxxxxxx xxxxx010 >> xxxxxxxx xxxxxxxx xxxxxxxx 00101010 >> -> emit 00101010 >> >> gives 01111000 00111100 11111110 00101010 which is impossible to decode… >> silly me forget to reverse bit order at code level, which gives : >> ‘w’ (119) 1111000 -> 0001111 >> ‘w’ (119) 1111000 -> 0001111 >> ‘w’ (119) 1111000 -> 0001111 >> '.' (46) 010111 -> 111010 >> 'e' (101) 00101 -> 10100 >> >> xxxxxxxx xxxxxxxx xxxxxxxx x0001111 >> xxxxxxxx xxxxxxxx xx000111 10001111 >> -> emit 10001111 >> xxxxxxxx xxxxxxxx xxxxxxxx xx000111 >> xxxxxxxx xxxxxxxx xxx00011 11000111 >> -> emit 11000111 >> xxxxxxxx xxxxxxxx xxxxxxxx xxx00011 >> xxxxxxxx xxxxxxxx xxxxx111 01000011 >> -> emit 01000011 >> xxxxxxxx xxxxxxxx xxxxxxxx xxxxx111 >> xxxxxxxx xxxxxxxx xxxxxxxx 10100111 >> -> emit 10100111 >> >> gives 10001111 11000111 01000011 10100111 or 8FC7 43A7, >> >> Bits-at-top looks a bit more complicated but is much closer to what CPUs >> are able to handle quickly, Deflate and now Brotli compression work this >> way and as I wrote before it’s still the way recommended by Charles Bloom >> for super fast implementation of bit level I/O. >> >> I hope it’s not too late to stop press to evaluate the real impact it >> could have on HPACK. >> >> Regards >> Frédéric Kayser >> >> >> > >
Received on Wednesday, 3 December 2014 00:28:05 UTC