W3C home > Mailing lists > Public > ietf-http-wg@w3.org > October to December 2012

Re: Updated Binary Optimized Header Encoding Draft

From: Frédéric Kayser <f.kayser@free.fr>
Date: Thu, 15 Nov 2012 01:17:20 +0100
Cc: James M Snell <jasnell@gmail.com>
Message-Id: <D65DBD3C-730B-49AF-846E-3413BD27D37D@free.fr>
To: ietf-http-wg@w3.org
Hi,
could you precise some points regarding UTF-8 encoding ? 

"The next bit (E) indicates, when set, that the header field value contains UTF-8 encoded character content."

- is a BOM allowed?
- are there restrictions concerning Unicode Normalizations Forms, NFC is used most of the time but NFD could lead to smaller compressed results

And since UTF-8 is used why stick to generic zlib/deflate for compression?
UTF-8 encoding has some inherent characteristics http://en.wikipedia.org/wiki/UTF-8#Description
A compression algorithm aware of those would be more efficient than deflate, Huffman encoding in deflate is context unaware (order 0) contrary to PPM (Prediction by partial matching) based algorithms.

By today standards zlib/deflate is totally outdated: search window limited to 32k Bytes (can you imagine how ridiculous this is when used in nowadays PNG files, look what Google did in WebP lossless), it's dog slow to compress/decompress compared to LZ4, compressed size is far from being on par with LZMA or even bzip2, OK running an LZMA decoder is probably not the best thing to do on power and memory limited smartphones, but I wouldn't mind moving away from zlib/deflate for something closer to the compressed size vs. compression time Pareto frontier.

Regards
-- 
Frédéric Kayser

Le 3 oct. 2012 à 19:07, James M Snell a écrit :

> FYI.. I have submitted an updated draft for the proposed Binary Optimized Header Encoding mechanism for the http2 effort. 
> 
>   http://www.ietf.org/id/draft-snell-httpbis-bohe-01.txt
> 
> A number of fairly significant changes have been made:
> 
>   1. The codepage identifier for registered header tokens has been removed.
>   2. The Per-header flags field has been removed and replaced with specific individual bits to indicate character-based values and multiple values
>   3. Value lengths have been increased from max 16-bit length to a max 22-bit length. 
> 
> The encoding itself remains just as compact with these changes. With the http version header field, for example, requiring no more than 6 uncompressed bytes to represent.
> 
> As always, feedback is more than welcome.
> 
> - James
Received on Thursday, 15 November 2012 00:17:53 GMT

This archive was generated by hypermail 2.2.0+W3C-0.50 : Thursday, 15 November 2012 00:17:54 GMT