Re: #578: getting real-ish numbers for option 3

Hi Martin!

On Fri, Oct 24, 2014 at 09:19:54PM +0200, Martin Thomson wrote:
> On 24 October 2014 21:01, Willy Tarreau <w@1wt.eu> wrote:
> > BTW, I forgot to say something, I was very surprized to find the encoding
> > for character zero in the huffmann table be shorter than some non-ascii
> > encodings, and the same size as some chars such as '
> > or '@'. I wonder
> > how this byte could have landed here with such a high frequency. Wouldn't
> > this mean that the trailing zero of analyzed strings was accidentely
> > counted when the table was built ? That probably has a very minimal impact
> > on the overall compression ratio but I found this surprizing.
> 
> That's because it was being used to delineate values for repeated
> header fields, prior to the removal of the reference set.

Ah yes I remember now, thanks!

> We could
> remove it from the frequency analysis and generate a new character
> table if we were making other changes.  The difference might not be
> noticeable.

Indeed. I don't expect any savings either, but if the table is ever
revisited, sure it's worth taking care of this.

Thanks!
Willy

Received on Friday, 24 October 2014 19:42:14 UTC