Re: Design Issue: GZIP flag on DATA Frames

Hello,
HTML5 accepts only UTF-8 encoding, there are better ways than Deflate to compress Unicode texts, bzip2 to start with. Deflate has no clue about UTF-8 since it is byte oriented, its search window is limited to 32 kilo bytes (in UTF-8 a single Devanagari character –used for Hindi and other languages in India– takes 3 bytes which seriously reduces the size of text that can actually be used as a reference for string matching, the same goes for other scripts like Cyrillic (Russian, Ukrainian, Bulgarian…), Greek,  Hebrew, Arabic… since they can no longer rely on single byte charsets and UTF-8 means 2 bytes per character for those).

For web performance having a compression scheme that could recognize and reverse/redo base64 encoding (Data URI, RFC2397) to handle "binary blobs" inside text files would be very appreciated.

Deflate misses some flexibility since it has no super fast mode à la LZ4 that would still provide decent compression but at much lower CPU cost (no entropy coding), nor something heavier on the other side (LZMA like).

Deflate was a nice compression scheme in the 90s, but the World (Wide Web) has changed since the 90s, look how archivers handle text files nowadays: they switch to PPMd, bzip2… because Deflate is outdated.

Compressing the headers is a good idea, but thinking about new compression schemes for the payload should not be overlooked.

Regards
Frédéric Kayser

Le 21 mai 2013 à 19:17, Poul-Henning Kamp a écrit :

> In message <519BAB26.2010501@zinks.de>, Roland Zink writes:
> 
>> This seem to make the introduction of new compression schemes more complex.
> 
> And what is the plausibility that any new compression schemes will ever
> make that worth-while ?
> 
> It's not nill, but it makes a convincing impression of nill.
> 
> -- 
> Poul-Henning Kamp       | UNIX since Zilog Zeus 3.20
> phk@FreeBSD.ORG         | TCP/IP since RFC 956
> FreeBSD committer       | BSD since 4.3-tahoe    
> Never attribute to malice what can adequately be explained by incompetence.

Received on Tuesday, 21 May 2013 23:16:02 UTC