Re: Delta Compression and UTF-8 Header Values

>>>>> "RP" == Roberto Peon <grmocg@gmail.com> writes:

RP> The header names are almost completely handled with the pre-seeded
RP> dictionary, so they really don't affect the character frequency
RP> count and/or thus the huffman encoding.

RP> Arithmetic coding gets better compression ratios, at the expense of
RP> gobs of CPU and complexity. I don't think that is a good tradeoff :/

It is sometimes hard to guess whether huffman is chosen due to inertia,
arithmetic patent agnst, or good technical reasons.  It is good to know
that in this case it is the latter.

I may not have expressed my primary point quite well enough though:

Although I doubt that right now there is any text in the headers which
is both common enough to warrent inclusion in a static table and not
seven-bit clean, my point was that even if such text shows up over time,
the fact that it is not seven-bit should not prevent its inclusion in
future, extended versions of the static table.  As such specifying that
text is defined to be utf-8 and the use of a static huffman table should
not contra-indicate each other.

-JimC
-- 
James Cloos <cloos@jhcloos.com>         OpenPGP: 1024D/ED7DAEA6

Received on Tuesday, 12 February 2013 23:50:46 UTC