- From: Frédéric Kayser <f.kayser@free.fr>
- Date: Wed, 22 May 2013 01:15:28 +0200
- To: ietf-http-wg@w3.org
Hello, HTML5 accepts only UTF-8 encoding, there are better ways than Deflate to compress Unicode texts, bzip2 to start with. Deflate has no clue about UTF-8 since it is byte oriented, its search window is limited to 32 kilo bytes (in UTF-8 a single Devanagari character –used for Hindi and other languages in India– takes 3 bytes which seriously reduces the size of text that can actually be used as a reference for string matching, the same goes for other scripts like Cyrillic (Russian, Ukrainian, Bulgarian…), Greek, Hebrew, Arabic… since they can no longer rely on single byte charsets and UTF-8 means 2 bytes per character for those). For web performance having a compression scheme that could recognize and reverse/redo base64 encoding (Data URI, RFC2397) to handle "binary blobs" inside text files would be very appreciated. Deflate misses some flexibility since it has no super fast mode à la LZ4 that would still provide decent compression but at much lower CPU cost (no entropy coding), nor something heavier on the other side (LZMA like). Deflate was a nice compression scheme in the 90s, but the World (Wide Web) has changed since the 90s, look how archivers handle text files nowadays: they switch to PPMd, bzip2… because Deflate is outdated. Compressing the headers is a good idea, but thinking about new compression schemes for the payload should not be overlooked. Regards Frédéric Kayser Le 21 mai 2013 à 19:17, Poul-Henning Kamp a écrit : > In message <519BAB26.2010501@zinks.de>, Roland Zink writes: > >> This seem to make the introduction of new compression schemes more complex. > > And what is the plausibility that any new compression schemes will ever > make that worth-while ? > > It's not nill, but it makes a convincing impression of nill. > > -- > Poul-Henning Kamp | UNIX since Zilog Zeus 3.20 > phk@FreeBSD.ORG | TCP/IP since RFC 956 > FreeBSD committer | BSD since 4.3-tahoe > Never attribute to malice what can adequately be explained by incompetence.
Received on Tuesday, 21 May 2013 23:16:02 UTC