Re: Multiple Huffman code tables from 姓名 on 2023-12-09 (ietf-http-wg@w3.org from October to December 2023)

From: 姓名 <falsandtru@gmail.com>
Date: Sat, 9 Dec 2023 21:42:54 +0900
To: Willy Tarreau <w@1wt.eu>, ietf-http-wg@w3.org
Message-ID: <CA+isZAKrSJh0_TzN=dkUZzoxxE+CxJO9YUJWp_o-5X9FO0YkTA@mail.gmail.com>
> I'm not saying that the *implementation* is complex. However, for a
low-level
> protocol change to be effective, it must be widely adopted, and
modifications
> applied to most major stacks. And for an implementation, having to
support two
> variants instead of one necessarily adds a little bit of complexity (even
in
> interoperability testing), so there really needs to be a good argument for
> this.

It appears that you are refusing to change the Huffman code.

> It depends 2.5% of what. Here we're speaking about 2.5% of something
> already tiny. If I read it well, we're suggesting that the *first*
> occurrence of a 133-byte headers is reduced to 129 bytes. When that
> happens in a series of 100 requests, that's only 0.04 bytes saved per
> request on average. Don't get me wrong, I'm not saying it's nothing, I'm
> saying that all factors must be considered. As I explained, more savings
> could be gained by revisiting the HPACK opcode encoding, that will save
> bytes for each and every header field for each and every request, not
> just the first one. And keep in mind that some implementations do no
> even compress outgoing headers because the savings are not considered
> worth the cost (particularly on the response direction).

It is the compression ratio in the compression algorithm(before / after).
It is not the number of bytes. The number of bytes reduced by them is
50-100 bytes at a time. Is that too little? The number of bytes is reduced
in the response as well.

> HPACK compression is extremely effective on the uplink from the client
> to the server, where it gains most savings by using the dynamic table
> and compresses to a single-byte most repetitive header fields, including
> large cookies. Huffman here is just a nice extra bonus but not a major
> difference.

Are improvements to standardized extras prohibited? It should not be. As
mentioned above, this extra has been reduced by 50-100 bytes. This should
be an improvement worth proposing.

> That's basically what most of us are already doing I think, e.g:

No. Your code defines a table. I say that such can be removed altogether.

> That's not what I'm saying. I'm saying that for a client to use your
> implementation, it must first know that the server will support it, and
> it cannot know this before receiving its SETTINGS frame, hence it's not
> usable before the first round-trip, which is where most of the huffman
> savings matter.

It appears that you are refusing to change the Huffman code. Need a signal
other than the version?

> Now, feel free to prove me wrong with real world examples where you
> observe significant changes on the volume of bytes sent by a client
> before and after your changes, with an emphasis on the first 10 MSS
> (14kB) which is where a first round trip will be needed, but at first
> glance I'm pretty sure this will be fairly marginal.

I cannot know the scenario in your head. But as mentioned above, this extra
has been reduced by 50-100 bytes at a time. Is that too little?

> You're welcome, many of us are not english natives either :-)

I did not say I am not a native English speaker. I said I am not good at
English :-)


2023年12月9日(土) 18:53 Willy Tarreau <w@1wt.eu>:

> On Sat, Dec 09, 2023 at 05:58:43PM +0900, ?? wrote:
> > > I seem to remember that the overall feeling was that gains to
> > > be expected there were not significant enough to warrant more
> complexity.
> >
> > This proposal addresses the current situation where tokens have greatly
> > increased header size for security reasons. The situation is different
> from
> > the past.
> >
> > > again the extra complexity was considered as an
> > > obstacle and in general it seems that there's not that much interest in
> > > squeezing slightly more bytes there.
> >
> > There is nothing particularly complex about this algorithm. The return is
> > commensurate with less complexity, as discussed below.
>
> I'm not saying that the *implementation* is complex. However, for a
> low-level
> protocol change to be effective, it must be widely adopted, and
> modifications
> applied to most major stacks. And for an implementation, having to support
> two
> variants instead of one necessarily adds a little bit of complexity (even
> in
> interoperability testing), so there really needs to be a good argument for
> this.
>
> > > If I read this right, it seems to me that this corresponds just to a
> delta
> > > of 2 bytes for a total of 500 bytes of data. That's really small.
> >
> > That is an example of low compression. Compression ratio improves by more
> > than 1% for the response of Google's home page in non-logged-in state.
> >
> > 'XPACK   comp. ratio response', 0.25389886578449905, 1.340300870942201
> > 'HPACK   comp. ratio response', 0.24155245746691867, 1.3184827478775605
> >
> > The compression ratio improves by 2.5% when logged in.
> >
> > 'XPACK   comp. ratio request', 0.24189189189189186, 1.3190730837789661
> > 'HPACK   comp. ratio request', 0.21498410174880767, 1.2738595514151183
> >
> > Compression is also improved by 2.5% on the Amazon home page. Is a 2.5%
> > improvement small?
> >
> > 'XPACK   comp. ratio request', 0.24909539473684206, 1.3317270835614938
> > 'HPACK   comp. ratio request', 0.22467105263157894, 1.2897751378871447
>
> It depends 2.5% of what. Here we're speaking about 2.5% of something
> already tiny. If I read it well, we're suggesting that the *first*
> occurrence of a 133-byte headers is reduced to 129 bytes. When that
> happens in a series of 100 requests, that's only 0.04 bytes saved per
> request on average. Don't get me wrong, I'm not saying it's nothing, I'm
> saying that all factors must be considered. As I explained, more savings
> could be gained by revisiting the HPACK opcode encoding, that will save
> bytes for each and every header field for each and every request, not
> just the first one. And keep in mind that some implementations do no
> even compress outgoing headers because the savings are not considered
> worth the cost (particularly on the response direction).
>
> HPACK compression is extremely effective on the uplink from the client
> to the server, where it gains most savings by using the dynamic table
> and compresses to a single-byte most repetitive header fields, including
> large cookies. Huffman here is just a nice extra bonus but not a major
> difference.
>
> > > Just think that being able to advertise the
> > > use and support of the new table would likely require more bytes
> >
> > The Huffman code for tokens is so regular that no table or tree is
> needed.
> > It is replaceable with conditional expressions.
>
> That's basically what most of us are already doing I think, e.g:
>
>   https://github.com/haproxy/haproxy/blob/master/src/hpack-huff.c#L784
>
> > > couldn't be used before the first
> > > round trip, which is where it matters the most.
> >
> > Perhaps you misunderstand. The initial state of the Huffman code is fixed
> > and invariant. It changes state only during encoding/decoding.
>
> That's not what I'm saying. I'm saying that for a client to use your
> implementation, it must first know that the server will support it, and
> it cannot know this before receiving its SETTINGS frame, hence it's not
> usable before the first round-trip, which is where most of the huffman
> savings matter.
>
> Now, feel free to prove me wrong with real world examples where you
> observe significant changes on the volume of bytes sent by a client
> before and after your changes, with an emphasis on the first 10 MSS
> (14kB) which is where a first round trip will be needed, but at first
> glance I'm pretty sure this will be fairly marginal.
>
> > > PS: please avoid responding to yourself multiple times and top-posting,
> > >    that makes it difficult to respond to your messages, and likely
> > >    further reduces the willingness to respond.
> >
> > I will try my best, but I am not good at English, so please forgive me a
> > little.
>
> You're welcome, many of us are not english natives either :-)
>
> Willy
>
Received on Saturday, 9 December 2023 12:43:39 UTC