W3C home > Mailing lists > Public > ietf-http-wg@w3.org > October to December 2012

Re: UTF-8 NFC vs NFD compression, French sample (was: Updated Binary Optimized Header Encoding Draft)

From: Roberto Peon <grmocg@gmail.com>
Date: Thu, 15 Nov 2012 06:31:48 -0800
Message-ID: <CAP+FsNdtQNz68OT6g0Q8z=korxCrpQQV=VwAcZt9fZFoe7oDZA@mail.gmail.com>
To: Willy Tarreau <w@1wt.eu>
Cc: Martin J. Dürst <duerst@it.aoyama.ac.jp>, HTTP Working Group <ietf-http-wg@w3.org>, Frédéric Kayser <f.kayser@free.fr>
Remember the CRIME exploit, guys. Stream compressors aren't very safe to
use...

-=R
On Nov 15, 2012 2:30 AM, "Willy Tarreau" <w@1wt.eu> wrote:

> Hi Frederic,
>
> On Thu, Nov 15, 2012 at 10:58:44AM +0100, Frédéric Kayser wrote:
> > Hello Martin,
> > I have a short French text sample here, it's a small extract from « Le
> tour du monde en quatre-vingts jours » ("Around the World in Eighty Days")
> by Jules Verne.
> >
> > The bzip2 compressed version of the NFD encoded text is smaller by 4
> bytes.
> > Using gzip it looks like a draw but in fact the Deflate stream itself is
> 4 bits shorter.
> > In the other hand when using xz (lzma2) NFC gives a better result.
> >
> > 2553 tdm80j-french-utf8-nfc.txt
> > 2625 tdm80j-french-utf8-nfd.txt
> >
> > 1312 tdm80j-french-utf8-nfc.txt.bz2
> > 1308 tdm80j-french-utf8-nfd.txt.bz2
> >
> > 1352 tdm80j-french-utf8-nfc.txt.gz
> > 1352 tdm80j-french-utf8-nfd.txt.gz
> >
> > defdb -s tdm80j-french-utf8-nfc.txt.gz
> > 10671 bits
> >
> > defdb -s tdm80j-french-utf8-nfd.txt.gz
> > 10667 bits
> >
> > Compressed files are enclosed in the zip archive attached to this email.
>
> Do not forget that the most important for HTTP is not the compression
> ratio but the compression speed. If you need a whole datacenter to
> compress 1000 streams, nobody will use it. If the compression induces
> delays, it will not be used either. If you check around, you'll see that
> HTTP compression engines right now compress at gzip-1 to achieve the best
> compression speed allowed on HTTP. And I agree with your comment in a
> previous mail that gzip is totally outdated. I'd like to have much faster
> compression algos such as LZ4, fastlz, etc... which are 10-100 times faster
> than gzip for around the same compression ratios as gzip-1.
>
> Cheers,
> Willy
>
>
>
Received on Thursday, 15 November 2012 14:32:16 GMT

This archive was generated by hypermail 2.2.0+W3C-0.50 : Thursday, 15 November 2012 14:32:22 GMT