Re: UTF-8 NFC vs NFD compression, French sample (was: Updated Binary Optimized Header Encoding Draft)

Remember the CRIME exploit, guys. Stream compressors aren't very safe to
use...

-=R
On Nov 15, 2012 2:30 AM, "Willy Tarreau" <w@1wt.eu> wrote:

> Hi Frederic,
>
> On Thu, Nov 15, 2012 at 10:58:44AM +0100, Frédéric Kayser wrote:
> > Hello Martin,
> > I have a short French text sample here, it's a small extract from « Le
> tour du monde en quatre-vingts jours » ("Around the World in Eighty Days")
> by Jules Verne.
> >
> > The bzip2 compressed version of the NFD encoded text is smaller by 4
> bytes.
> > Using gzip it looks like a draw but in fact the Deflate stream itself is
> 4 bits shorter.
> > In the other hand when using xz (lzma2) NFC gives a better result.
> >
> > 2553 tdm80j-french-utf8-nfc.txt
> > 2625 tdm80j-french-utf8-nfd.txt
> >
> > 1312 tdm80j-french-utf8-nfc.txt.bz2
> > 1308 tdm80j-french-utf8-nfd.txt.bz2
> >
> > 1352 tdm80j-french-utf8-nfc.txt.gz
> > 1352 tdm80j-french-utf8-nfd.txt.gz
> >
> > defdb -s tdm80j-french-utf8-nfc.txt.gz
> > 10671 bits
> >
> > defdb -s tdm80j-french-utf8-nfd.txt.gz
> > 10667 bits
> >
> > Compressed files are enclosed in the zip archive attached to this email.
>
> Do not forget that the most important for HTTP is not the compression
> ratio but the compression speed. If you need a whole datacenter to
> compress 1000 streams, nobody will use it. If the compression induces
> delays, it will not be used either. If you check around, you'll see that
> HTTP compression engines right now compress at gzip-1 to achieve the best
> compression speed allowed on HTTP. And I agree with your comment in a
> previous mail that gzip is totally outdated. I'd like to have much faster
> compression algos such as LZ4, fastlz, etc... which are 10-100 times faster
> than gzip for around the same compression ratios as gzip-1.
>
> Cheers,
> Willy
>
>
>

Received on Thursday, 15 November 2012 14:32:16 UTC