- From: Willy Tarreau <w@1wt.eu>
- Date: Tue, 6 Jan 2015 00:39:14 +0100
- To: Frédéric Kayser <f.kayser@free.fr>
- Cc: ietf-http-wg@w3.org
Hello Frédéric, On Mon, Jan 05, 2015 at 11:52:28PM +0100, Frédéric Kayser wrote: > Hello, > to me the major drawback of HPACK is not really technical, I don't really > bother if it could be 20% faster or shrink data even further using topnotch > ANS/FSE compression instead of using a compression scheme even older than my > parents, it's a bit more philosophical: the length of the Huffman codes* are > biased towards a subset of ASCII to the point it becomes pointless to try to > compress something that is not plain English and this is a major spit in the > face of international users, just put two or three code points that are > outside of ASCII and blamo! you take such a huge penalty that you can forget > about using compression and go straight to the uncompressed way. This is the > 21st century Unicode as taken all over the Web, nowadays IRIs can be written > in Japanese, Arabic, Russian or Greek, but deep inside HTTP/2 ASCII and > percent-encoded strings still rule (revising RFC 3987 would be welcome a some > point). Refrain from using your mother lingo in HTTP/2 headers, it's not open > to the wide world of tomorrow since it's based on stats from yesterday. Well, I'd object three things here : - most of the header field values are plain ASCII strings or even numbers or tokens, so most of them are not subject to i18n ; - for header fields containing i18n, you can decide *not* to compress their values. It would indeed be foolish to expand their size for no reason when you can send them as-is. - most of the compression obtained in HPACK comes from its ability to reuse header field names and name-value pairs. That's what allows you to compress hundreds of bytes into a few bytes, while huffman probably adds a few tens of extra percent to the compression ratio. My personal feeling is that huffman will be useful to compress tokens, user-agents, maybe cookies since base64 chars are shorter than one byte here, and time will tell what is relevant to compress or not. I would not be surprized if some implementations perform a quick test to see what version is smaller. That's easy, just sum up the huffman bit lengths for each byte in a value and decide if it's shorter or larger than the raw version. So in the end I'm absolutely not worried about huffman causing inefficiencies in non US-ASCII regions, and to be honnest I even think that my first implementation may not even implement the huffman encoding at all (decoding is mandatory though). My two cents as well :-) Cheers, Willy
Received on Monday, 5 January 2015 23:39:39 UTC