Re: HTTP2: Header Field Name. from Felix on 2017-10-10 (ietf-http-wg@w3.org from October to December 2017)

From: Felix <37554884@qq.com>
Date: Tue, 10 Oct 2017 05:57:33 +0000
To: "Willy Tarreau" <w@1wt.eu>
Cc: "ietf-http-wg" <ietf-http-wg@w3.org>
Message-Id: <tencent_94006C570BDE4742D33563B079D3052AA909@qq.com>

Hi. 
I'm just curious about how's this conclusion of "~4.81 bits per character
to encode any letter" coming from?

------------------ Original ------------------
From:  "Willy Tarreau";<w@1wt.eu>;
Date:  Tue, Oct 10, 2017 01:36 PM
To:  "Felix"<37554884@qq.com>;
Cc:  "ietf-http-wg"<ietf-http-wg@w3.org>;
Subject:  Re: HTTP2: Header Field Name.

On Tue, Oct 10, 2017 at 01:19:03PM +0800, Felix wrote:
> > That's wrong, because in HTTP "Cool-Feature" is the same as "cool-feature"
> > so you save one bit per character = 12 bits total by having the dash instead
> > of upper cases, compared to the former where each character must have a bit
> > to indicate the case. As long as you have on average less than one dash per
> > 8 caracters, the dash wins. This counts when you start to use certain
> > compression algorithms such as Huffman.
> 
> 
> Why you need extra bit to indicate the case? They are ascii character, one
> octet each no matter lowercase or uppercase without Huffman Encoding.
> Even by using Huffman Encoding.
> CoolFeature is 64(7+5+5+6+7+5+5+5+6+6+5+2) bits including padding bits.
> cool-feature is 72(5+5+5+6+6+6+5+5+5+6+6+5+7) bits including padding bits.

I'm not talking about the static huffman table we have to encode anything
including URLs and values, I'm talking about encoding in general. What
I'm saying is that (to make things simple) you need ~4.81 bits per character
to encode any letter, a dash and an end marker (28 symbols). You need ~5.73
bits per character to encode any letter with its case and an end marker (53
symbols). So at minima you'll need 68.76 bits for the camel case version,
against 62.40 for the dash version. That's one less octet for the dash
version. You can turn it however you want, you won't change this because
it's a matter of alphabet complexity.

If you want to turn it back to plain h2 encoding, then cool-feature could
very well be one or two bytes by being in the table (static or dynamic).
And there its presence in lower case makes it much easier and cheaper to
lookup existing entries using hashes, trees, dichotomy or even perfect
hashes.

Willy

Received on Tuesday, 10 October 2017 12:57:04 UTC