Re: comments about draft-ietf-httpbis-header-compression from Michael Sweet on 2015-01-06 (ietf-http-wg@w3.org from January to March 2015)

From: Michael Sweet <msweet@apple.com>
Date: Mon, 05 Jan 2015 20:33:12 -0500
To: Frédéric Kayser <f.kayser@free.fr>
Cc: ietf-http-wg@w3.org, Willy Tarreau <w@1wt.eu>
Message-id: <22C3974B-5E4F-4338-8EE5-7324F811D91E@apple.com>

That would be a valid complaint if HTTP fields could safely carry Unicode/UTF-8 text values, but they can't.  See the mailing list archives for long discussions on the topic...

> On Jan 5, 2015, at 5:52 PM, Frédéric Kayser <f.kayser@free.fr> wrote:
> 
> Hello,
> to me the major drawback of HPACK is not really technical, I don't really bother if it could be 20% faster or shrink data even further using topnotch ANS/FSE compression instead of using a compression scheme even older than my parents, it's a bit more philosophical: the length of the Huffman codes* are biased towards a subset of ASCII to the point it becomes pointless to try to compress something that is not plain English and this is a major spit in the face of international users, just put two or three code points that are outside of ASCII and blamo! you take such a huge penalty that you can forget about using compression and go straight to the uncompressed way. This is the 21st century Unicode as taken all over the Web, nowadays IRIs can be written in Japanese, Arabic, Russian or Greek, but deep inside HTTP/2 ASCII and percent-encoded strings still rule (revising RFC 3987 would be welcome a some point). Refrain from using your mother lingo in HTTP/2 headers, it's not open to the wide world of tomorrow since it's based on stats from yesterday.
> 
> *30 bits long codes is ridiculous and makes code slower for 32-bits CPU capping them to 16 or 15 bits would have no impact on overall compression (since hitting such large codes would still make it pointless to use). I still don't get why the Huffman part tries to be a universal encoder since in practice it can only really compress a small subset of ASCII and anything else and especially UTF-8 quickly expands, I'd rather see some kind of VLE clearly geared toward this subset (would be more effective) and not trying to be universal at all. This way if the string is only made of code points from the subset and will compress pretty well, otherwise record it uncompressed (don't even try to encode it).
> 
> My two cents
> 
>> HPACK is simple to implement, simple to understand, byte-aligned and specific
>> to a single purpose. And even if it were less efficient than any generic
>> algorithm you would propose, it would always be possible to write a more
>> efficient one dedicated to this task.
> 
>> Willy
> 
> 
> 

_________________________________________________________
Michael Sweet, Senior Printing System Engineer, PWG Chair

Received on Tuesday, 6 January 2015 01:33:42 UTC