- From: Martin Thomson <martin.thomson@gmail.com>
- Date: Sat, 12 Apr 2014 12:01:21 -0700
- To: HTTP Working Group <ietf-http-wg@w3.org>
Now that we have a (promised) final Huffman table in HPACK, I was able to do the little project I've been meaning to do. With differing encoded lengths the Huffman tables add, it's not better to use certain characters in preference to others. Combined with the restrictions on the strings we encode, this can make encoding binary blobs less than optimal. Base64 (and the URI-safe variant) manage pretty well because those characters are favoured by HPACK. Same goes for the cookie characters. I wanted to see if it was possible to do better. It is. But not by much. I've put my code on github [1] and npm [2]. This module is essentially a replacement for base64 that is optimized for HPACK and for different character sets: the characters permitted in cookies, the characters permitted in URI path components, and the characters permitted in query components. The results for 100,000 different 20 byte sequences, chosen at random, are interesting: Average sizes: min/ave/max (size compared against Base64+Huffman) Raw Huffman coding (invalid): 25/48.20803/64 (203.13%) Base 64 (no Huffman): 27/27/27 (113.77%) Base 64 with Huffman: 20/23.73286/27 (100%) bhpack cookie with Huffman: 19/23.6052/30 (99.46%) bhpack URI safe with Huffman: 20/22.98342/28 (96.84%) bhpack URI query with Huffman: 20/23.01238/28 (96.96%) [3] This shows that HPACK is pretty good at encoding cookies, but there are small gains to be had for URI components. If character restrictions are wildly different to these common values, this could be even better. This hasn't been completely optimized, so there might be a few bits left to save. I'm using Huffman encoding as the basis of this - in reverse, of course - which isn't quite as good as arithmetic or range coding. I could probably manage the trailing bits more elegantly. Pull requests happily accepted. --Martin [1] https://github.com/martinthomson/bhpack [2] npm install bhpack [3] In theory, this is better than the URI safe option, but as chance would have it, not in this run.
Received on Saturday, 12 April 2014 19:01:48 UTC