- From: Willy Tarreau <w@1wt.eu>
- Date: Tue, 6 Jan 2015 00:39:14 +0100
- To: Frédéric Kayser <f.kayser@free.fr>
- Cc: ietf-http-wg@w3.org
Hello Frédéric,
On Mon, Jan 05, 2015 at 11:52:28PM +0100, Frédéric Kayser wrote:
> Hello,
> to me the major drawback of HPACK is not really technical, I don't really
> bother if it could be 20% faster or shrink data even further using topnotch
> ANS/FSE compression instead of using a compression scheme even older than my
> parents, it's a bit more philosophical: the length of the Huffman codes* are
> biased towards a subset of ASCII to the point it becomes pointless to try to
> compress something that is not plain English and this is a major spit in the
> face of international users, just put two or three code points that are
> outside of ASCII and blamo! you take such a huge penalty that you can forget
> about using compression and go straight to the uncompressed way. This is the
> 21st century Unicode as taken all over the Web, nowadays IRIs can be written
> in Japanese, Arabic, Russian or Greek, but deep inside HTTP/2 ASCII and
> percent-encoded strings still rule (revising RFC 3987 would be welcome a some
> point). Refrain from using your mother lingo in HTTP/2 headers, it's not open
> to the wide world of tomorrow since it's based on stats from yesterday.
Well, I'd object three things here :
- most of the header field values are plain ASCII strings or even numbers or
tokens, so most of them are not subject to i18n ;
- for header fields containing i18n, you can decide *not* to compress their
values. It would indeed be foolish to expand their size for no reason when
you can send them as-is.
- most of the compression obtained in HPACK comes from its ability to reuse
header field names and name-value pairs. That's what allows you to compress
hundreds of bytes into a few bytes, while huffman probably adds a few tens
of extra percent to the compression ratio.
My personal feeling is that huffman will be useful to compress tokens,
user-agents, maybe cookies since base64 chars are shorter than one byte
here, and time will tell what is relevant to compress or not. I would not
be surprized if some implementations perform a quick test to see what
version is smaller. That's easy, just sum up the huffman bit lengths
for each byte in a value and decide if it's shorter or larger than the
raw version.
So in the end I'm absolutely not worried about huffman causing
inefficiencies in non US-ASCII regions, and to be honnest I even think
that my first implementation may not even implement the huffman encoding
at all (decoding is mandatory though).
My two cents as well :-)
Cheers,
Willy
Received on Monday, 5 January 2015 23:39:39 UTC