Re: Multiple Huffman code tables from Willy Tarreau on 2023-12-09 (ietf-http-wg@w3.org from October to December 2023)

From: Willy Tarreau <w@1wt.eu>
Date: Sat, 9 Dec 2023 14:13:16 +0100
To: falsandtru@gmail.com
Cc: ietf-http-wg@w3.org
Message-ID: <20231209131316.GA21383@1wt.eu>
On Sat, Dec 09, 2023 at 09:42:54PM +0900, ?? wrote:
> > I'm not saying that the *implementation* is complex. However, for a
> low-level
> > protocol change to be effective, it must be widely adopted, and
> modifications
> > applied to most major stacks. And for an implementation, having to
> support two
> > variants instead of one necessarily adds a little bit of complexity (even
> in
> > interoperability testing), so there really needs to be a good argument for
> > this.
> 
> It appears that you are refusing to change the Huffman code.

Huh ? No sure what you mean.

> > It depends 2.5% of what. Here we're speaking about 2.5% of something
> > already tiny. If I read it well, we're suggesting that the *first*
> > occurrence of a 133-byte headers is reduced to 129 bytes. When that
> > happens in a series of 100 requests, that's only 0.04 bytes saved per
> > request on average. Don't get me wrong, I'm not saying it's nothing, I'm
> > saying that all factors must be considered. As I explained, more savings
> > could be gained by revisiting the HPACK opcode encoding, that will save
> > bytes for each and every header field for each and every request, not
> > just the first one. And keep in mind that some implementations do no
> > even compress outgoing headers because the savings are not considered
> > worth the cost (particularly on the response direction).
> 
> It is the compression ratio in the compression algorithm(before / after).
> It is not the number of bytes. The number of bytes reduced by them is
> 50-100 bytes at a time. Is that too little?

50-100 bytes per what ? Per header ? per request ? Per 10kB of headers
sent ? You just sent raw numbers without *any* explanation. I read 1.33
vs 1.29 as the average compression ratios, which sound reasonable since
large values are essentially made of base64 values hence only have 0.75
bytes of entropy. For 100 bytes to be saved at such ratios, it would
require roughly 4200 non-indexable bytes to be sent. Is that a nice
improvement ? Maybe. Does anybody care about having to maintain a second
implementations to save that now that the protocol is widely deployed ?
I'm much less sure.

> The number of bytes is reduced in the response as well.

OK but in practice nobody cares about these ones since they come with
tens to hundreds of kB of extra data.

> > HPACK compression is extremely effective on the uplink from the client
> > to the server, where it gains most savings by using the dynamic table
> > and compresses to a single-byte most repetitive header fields, including
> > large cookies. Huffman here is just a nice extra bonus but not a major
> > difference.
> 
> Are improvements to standardized extras prohibited? It should not be.

I don't understand why you're stating this. I suspect it will be difficult
to discuss based on technical grounds... What is important to understand
is that you cannot improve standards by breaking them, so it must always
be done in a backwards-compatible way (i.e. the need for discovering the
support on the other side).

> As
> mentioned above, this extra has been reduced by 50-100 bytes. This should
> be an improvement worth proposing.
> 
> > That's basically what most of us are already doing I think, e.g:
> 
> No. Your code defines a table. I say that such can be removed altogether.

The tables here are just maps between sets of bytes. Also you *say* that
you can remove them but your example code has plenty, which is counter-
intuitive. You just dumped your code here with raw data without any
explanation about what is supposed to make it better.

> > That's not what I'm saying. I'm saying that for a client to use your
> > implementation, it must first know that the server will support it, and
> > it cannot know this before receiving its SETTINGS frame, hence it's not
> > usable before the first round-trip, which is where most of the huffman
> > savings matter.
> 
> It appears that you are refusing to change the Huffman code.

Please stop rehashing this non-sense. I'm trying to help you get your
proposal easier to review and understand. If you want to insult me all
the time, go find someone else to review it.

> Need a signal other than the version?

The version of what ? It seems that you need an explanation of how HTTP
works. First a client connects to a server and advertises the protocols
it's willing to speak using ALPN. The server responds with its ALPN
string as well. From this point the client knows it can use H2 to speak
to the server, it sends its preface, SETTINGS frame, and a bunch of
requests in a conservative way (i.e. assuming the server is OK with
default settings). Then the server sends its SETTINGS, SETTINGS ACK,
and starts processing the received requests.

Here, assuming your client wants to use a new version of the huffman
encoder, it would need to advertise its support using a SETTINGS frame,
and couldn't use it until it sees the server's SETTINGS frame that
indicates that it supports it. It's *only* at this point that it will
be able to switch to the new version. A whole round trip will have been
lost, with up to 14 kB of data uploaded at once. Once you've spent your
initial time in the first round trip there's much less to gain later,
because the first round trip is where you're trying to reduce the amount
of data to make sure not to waste a round trip.

> > Now, feel free to prove me wrong with real world examples where you
> > observe significant changes on the volume of bytes sent by a client
> > before and after your changes, with an emphasis on the first 10 MSS
> > (14kB) which is where a first round trip will be needed, but at first
> > glance I'm pretty sure this will be fairly marginal.
> 
> I cannot know the scenario in your head.

There's no scenario in my head, I'm speaking about a client sending many
requests over a just established TCP connection, and using compression
to save bytes and try to save time by avoiding a round trip, which is
the whole point of headers compression.

> But as mentioned above, this extra
> has been reduced by 50-100 bytes at a time. Is that too little?

As asked abvoe, per what unit, and under which scenario ? You're
proposing something, you need to back it up with data.

> > You're welcome, many of us are not english natives either :-)
> 
> I did not say I am not a native English speaker. I said I am not good at
> English :-)

I know, but usually that comes together.

Willy
Received on Saturday, 9 December 2023 13:13:23 UTC