- From: Martin Thomson <martin.thomson@gmail.com>
- Date: Sun, 13 Apr 2014 15:16:01 -0700
- To: Amos Jeffries <squid3@treenet.co.nz>
- Cc: HTTP Working Group <ietf-http-wg@w3.org>
On 13 April 2014 02:44, Amos Jeffries <squid3@treenet.co.nz> wrote: > Therefore selectively URL-encoding characters whose huffman > representation is over 17 bits in length will result in a compression > savings. > Question is whether a decoder can be written with a special case to also > perform url-decode when locating the % symbol and still be fast overall ? That would work, but not as well as what I did, which (as Ilari calculates) is close to ideal, less the partial bit-per-symbol loss in comparison to using more efficient codings. Most of the 17-bit sequences are illegal anyway. (You can run the experiment yourself by adding one of the high-valued octets to the set of valid "characters". I'm using Unicode code points for octet values so that they fit in strings. You'll see that the encoder avoids using the octet as much as possible, in favour of '0', '1', and other values with shorter encodings.) > Secondary question; What if the Huffman symbols were further optimized > to reduce the bit length of the URL-encoding characters, and in > particular the %. See above. Yes it would make it more feasible to %-encode, but it would also have a negative impact on existing strings. There are more existing strings, so it would have a negative net effect.
Received on Sunday, 13 April 2014 22:16:29 UTC