Re: comments about draft-ietf-httpbis-header-compression from Willy Tarreau on 2015-01-05 (ietf-http-wg@w3.org from January to March 2015)

From: Willy Tarreau <w@1wt.eu>
Date: Tue, 6 Jan 2015 00:39:14 +0100
To: Frédéric Kayser <f.kayser@free.fr>
Cc: ietf-http-wg@w3.org
Message-ID: <20150105233914.GB28358@1wt.eu>

Hello Frédéric,

On Mon, Jan 05, 2015 at 11:52:28PM +0100, Frédéric Kayser wrote:
> Hello,
> to me the major drawback of HPACK is not really technical, I don't really
> bother if it could be 20% faster or shrink data even further using topnotch
> ANS/FSE compression instead of using a compression scheme even older than my
> parents, it's a bit more philosophical: the length of the Huffman codes* are
> biased towards a subset of ASCII to the point it becomes pointless to try to
> compress something that is not plain English and this is a major spit in the
> face of international users, just put two or three code points that are
> outside of ASCII and blamo! you take such a huge penalty that you can forget
> about using compression and go straight to the uncompressed way. This is the
> 21st century Unicode as taken all over the Web, nowadays IRIs can be written
> in Japanese, Arabic, Russian or Greek, but deep inside HTTP/2 ASCII and
> percent-encoded strings still rule (revising RFC 3987 would be welcome a some
> point). Refrain from using your mother lingo in HTTP/2 headers, it's not open
> to the wide world of tomorrow since it's based on stats from yesterday.

Well, I'd object three things here :
  - most of the header field values are plain ASCII strings or even numbers or
    tokens, so most of them are not subject to i18n ;

  - for header fields containing i18n, you can decide *not* to compress their
    values. It would indeed be foolish to expand their size for no reason when
    you can send them as-is.

  - most of the compression obtained in HPACK comes from its ability to reuse
    header field names and name-value pairs. That's what allows you to compress
    hundreds of bytes into a few bytes, while huffman probably adds a few tens
    of extra percent to the compression ratio.

My personal feeling is that huffman will be useful to compress tokens,
user-agents, maybe cookies since base64 chars are shorter than one byte
here, and time will tell what is relevant to compress or not. I would not
be surprized if some implementations perform a quick test to see what
version is smaller. That's easy, just sum up the huffman bit lengths
for each byte in a value and decide if it's shorter or larger than the
raw version.

So in the end I'm absolutely not worried about huffman causing
inefficiencies in non US-ASCII regions, and to be honnest I even think
that my first implementation may not even implement the huffman encoding
at all (decoding is mandatory though).

My two cents as well :-)

Cheers,
Willy

Received on Monday, 5 January 2015 23:39:39 UTC