Re: Multiple Huffman code tables from Willy Tarreau on 2023-12-09 (ietf-http-wg@w3.org from October to December 2023)

From: Willy Tarreau <w@1wt.eu>
Date: Sat, 9 Dec 2023 08:33:44 +0100
To: falsandtru@gmail.com
Cc: ietf-http-wg@w3.org
Message-ID: <20231209073344.GA17558@1wt.eu>
Hello,

On Sat, Dec 09, 2023 at 07:02:20AM +0900, ?? wrote:
> To all. Please read the proposal and the source code before replying with
> assumptions.

Well, your "proposal" is just a single paragraph with a raw dump of
whoever knows what it shows, and there's zero comment in your source
code to explain the intent nor what it's trying to do, there are
essentially hard-coded values and code sequences. I'm sorry but you
cannot expect participants to spend a lot of time trying to guess what
you're trying to achieve this way.

> > HPACK/QPACK uses only one static table of Huffman coding, but the
> > compression ratio can be improved by combining multiple tables. There is *no
> > overhead* in data size due to the combination. Theoretically, two code
> > tables can reduce the bit length of one code table by an average of one
> > bit, since the two code tables itself has one bit of information. *This
> > proposal especially reduces the token size increasing in recent years.* Let
> > me know the link to the conclusions if this approach has been considered.
> > The following is a comparison based on my proof of concept code. The
> > leftmost number is the reduced bit size.

The relevance of the huffman table was discussed during the design phase,
and even the data set that served to build the HPACK headers table as well.
I vaguely remember some discussions where we wondered if we could squeeze
more by having a distinct static table per direction and doing the same
for huffman (since it's possible that the bytes distribution differs slightly
between sides). I seem to remember that the overall feeling was that gains to
be expected there were not significant enough to warrant more complexity.

There was another discussion about the possibility to use FSE instead of
huffman, which could bring better compression ratios, at the expense of a
bit more complexity, and again the extra complexity was considered as an
obstacle and in general it seems that there's not that much interest in
squeezing slightly more bytes there. Ah, I just found it, here it is, along
with another thread on the same topic:

  https://lists.w3.org/Archives/Public/ietf-http-wg/2014JanMar/0029.html
  https://lists.w3.org/Archives/Public/ietf-http-wg/2014AprJun/1275.html

Overall the main purpose of the huffman coding here is to get closer to
6 bits per character for most common ones that are found in header field
names (and values) that are not found as-is in the HPACK static table. A
good example is the first emission of a large base64-encoded cookie. The
purpose is not to save a lot there. Most of the savings come from the
dynamic table that allows a sender to send the field name and value only
once in a while.

> > 'XPACK   comp. ratio request', 0.2533415841584158, 1.3393005138405436
> > 'HPACK   comp. ratio request', 0.2471534653465347, 1.3282919612033537

If I read this right, it seems to me that this corresponds just to a delta
of 2 bytes for a total of 500 bytes of data. That's really small. Some
changes to the HPACK encoding were proposed already during the design
phase, that could save significantly more and were rejected as being too
small to warrant a change. Just think that being able to advertise the
use and support of the new table would likely require more bytes on the
wire (e.g. a SETTINGS frame) and couldn't be used before the first
round trip, which is where it matters the most.

Regards,
Willy

PS: please avoid responding to yourself multiple times and top-posting,
    that makes it difficult to respond to your messages, and likely
    further reduces the willingness to respond.
Received on Saturday, 9 December 2023 07:33:52 UTC