Re: Multiple Huffman code tables from 姓名 on 2023-12-09 (ietf-http-wg@w3.org from October to December 2023)

From: 姓名 <falsandtru@gmail.com>
Date: Sat, 9 Dec 2023 17:58:43 +0900
To: Willy Tarreau <w@1wt.eu>, ietf-http-wg@w3.org
Message-ID: <CA+isZA+JekmhBdHXV1V1+aPPcArJ7EeAN8Zi3Wrtf7HqHKkZJg@mail.gmail.com>
> Well, your "proposal" is just a single paragraph with a raw dump of
> whoever knows what it shows, and there's zero comment in your source
> code to explain the intent nor what it's trying to do, there are
> essentially hard-coded values and code sequences. I'm sorry but you
> cannot expect participants to spend a lot of time trying to guess what
> you're trying to achieve this way.

It is permissible to speculate on assumptions, but not to assert them.

> I seem to remember that the overall feeling was that gains to
> be expected there were not significant enough to warrant more complexity.

This proposal addresses the current situation where tokens have greatly
increased header size for security reasons. The situation is different from
the past.

> again the extra complexity was considered as an
> obstacle and in general it seems that there's not that much interest in
> squeezing slightly more bytes there.

There is nothing particularly complex about this algorithm. The return is
commensurate with less complexity, as discussed below.

> A
> good example is the first emission of a large base64-encoded cookie. The
> purpose is not to save a lot there. Most of the savings come from the
> dynamic table that allows a sender to send the field name and value only
> once in a while.

As discussed below, this proposes a compression ratio that makes it
worthwhile to implement the savings.

> If I read this right, it seems to me that this corresponds just to a delta
> of 2 bytes for a total of 500 bytes of data. That's really small.

That is an example of low compression. Compression ratio improves by more
than 1% for the response of Google's home page in non-logged-in state.

'XPACK   comp. ratio response', 0.25389886578449905, 1.340300870942201
'HPACK   comp. ratio response', 0.24155245746691867, 1.3184827478775605

The compression ratio improves by 2.5% when logged in.

'XPACK   comp. ratio request', 0.24189189189189186, 1.3190730837789661
'HPACK   comp. ratio request', 0.21498410174880767, 1.2738595514151183

Compression is also improved by 2.5% on the Amazon home page. Is a 2.5%
improvement small?

'XPACK   comp. ratio request', 0.24909539473684206, 1.3317270835614938
'HPACK   comp. ratio request', 0.22467105263157894, 1.2897751378871447

> Just think that being able to advertise the
> use and support of the new table would likely require more bytes

The Huffman code for tokens is so regular that no table or tree is needed.
It is replaceable with conditional expressions.

> couldn't be used before the first
> round trip, which is where it matters the most.

Perhaps you misunderstand. The initial state of the Huffman code is fixed
and invariant. It changes state only during encoding/decoding.

> PS: please avoid responding to yourself multiple times and top-posting,
>    that makes it difficult to respond to your messages, and likely
>    further reduces the willingness to respond.

I will try my best, but I am not good at English, so please forgive me a
little.

2023年12月9日(土) 16:33 Willy Tarreau <w@1wt.eu>:

> Hello,
>
> On Sat, Dec 09, 2023 at 07:02:20AM +0900, ?? wrote:
> > To all. Please read the proposal and the source code before replying with
> > assumptions.
>
> Well, your "proposal" is just a single paragraph with a raw dump of
> whoever knows what it shows, and there's zero comment in your source
> code to explain the intent nor what it's trying to do, there are
> essentially hard-coded values and code sequences. I'm sorry but you
> cannot expect participants to spend a lot of time trying to guess what
> you're trying to achieve this way.
>
> > > HPACK/QPACK uses only one static table of Huffman coding, but the
> > > compression ratio can be improved by combining multiple tables. There
> is *no
> > > overhead* in data size due to the combination. Theoretically, two code
> > > tables can reduce the bit length of one code table by an average of one
> > > bit, since the two code tables itself has one bit of information. *This
> > > proposal especially reduces the token size increasing in recent
> years.* Let
> > > me know the link to the conclusions if this approach has been
> considered.
> > > The following is a comparison based on my proof of concept code. The
> > > leftmost number is the reduced bit size.
>
> The relevance of the huffman table was discussed during the design phase,
> and even the data set that served to build the HPACK headers table as well.
> I vaguely remember some discussions where we wondered if we could squeeze
> more by having a distinct static table per direction and doing the same
> for huffman (since it's possible that the bytes distribution differs
> slightly
> between sides). I seem to remember that the overall feeling was that gains
> to
> be expected there were not significant enough to warrant more complexity.
>
> There was another discussion about the possibility to use FSE instead of
> huffman, which could bring better compression ratios, at the expense of a
> bit more complexity, and again the extra complexity was considered as an
> obstacle and in general it seems that there's not that much interest in
> squeezing slightly more bytes there. Ah, I just found it, here it is, along
> with another thread on the same topic:
>
>   https://lists.w3.org/Archives/Public/ietf-http-wg/2014JanMar/0029.html
>   https://lists.w3.org/Archives/Public/ietf-http-wg/2014AprJun/1275.html
>
> Overall the main purpose of the huffman coding here is to get closer to
> 6 bits per character for most common ones that are found in header field
> names (and values) that are not found as-is in the HPACK static table. A
> good example is the first emission of a large base64-encoded cookie. The
> purpose is not to save a lot there. Most of the savings come from the
> dynamic table that allows a sender to send the field name and value only
> once in a while.
>
> > > 'XPACK   comp. ratio request', 0.2533415841584158, 1.3393005138405436
> > > 'HPACK   comp. ratio request', 0.2471534653465347, 1.3282919612033537
>
> If I read this right, it seems to me that this corresponds just to a delta
> of 2 bytes for a total of 500 bytes of data. That's really small. Some
> changes to the HPACK encoding were proposed already during the design
> phase, that could save significantly more and were rejected as being too
> small to warrant a change. Just think that being able to advertise the
> use and support of the new table would likely require more bytes on the
> wire (e.g. a SETTINGS frame) and couldn't be used before the first
> round trip, which is where it matters the most.
>
> Regards,
> Willy
>
> PS: please avoid responding to yourself multiple times and top-posting,
>     that makes it difficult to respond to your messages, and likely
>     further reduces the willingness to respond.
>
Received on Saturday, 9 December 2023 08:59:32 UTC