W3C home > Mailing lists > Public > ietf-http-wg@w3.org > April to June 2013

Re: Header Serialization Discussion

From: Frédéric Kayser <f.kayser@free.fr>
Date: Tue, 16 Apr 2013 04:23:41 +0200
To: ietf-http-wg@w3.org
Message-Id: <24370F45-C4B4-41A2-8515-5B239766A943@free.fr>
If text fields can effectively be encoded as UTF-8 would it be wise to use it to send IRIs (RFC3987)?
without punycode:
http://xn--acadmie-franaise-npb1a.fr/ vs. http://académie-française.fr/
http://www.xn--cigacz-2ib.pl/ vs. http://www.ścigacz.pl/
http://xn--rlcuo9h.xn--wkc4axeaevb3oqbg.xn--xkc2al3hye2a/ vs. http://தளம்.ஆள்களமையம்.இலங்கை/
http://xn--mgbggrfi2ikdb7d.xn--mgberp4a5d4ar/ vs. http://مركزالتسجيل.السعودية/

and without percent encoding:
zdj%C4%99cia vs. zdjęcia
g%C3%B6r%C3%BCnt%C3%BC vs. görüntü

I wouldn't mind if HTTP/2 clearly took the bull by the horns regarding I18N.

The easiest way to (re)encode UTF-8 using variable code length would be to collect/define statistics only for the leading octet and store the continuation octets as fixed 6-bit values (since they are restricted to the 80-BF range, 64 values).

Frédéric Kayser

James M Snell wrote :

> Text can be either UTF-8 or ISO-8859-1, indicated by a single bit flag
> following the type code. All text strings are prefixed by it's length
> given as an unsigned variant length integer
> For ISO-8859-1 Text, the Static Huffman Code used by Delta would be
> used for the value. If we can develop an approach to effectively
> handling Huffman coding for arbitrary UTF-8, then we can apply Huffman
> coding to that as well.
Received on Tuesday, 16 April 2013 02:24:10 UTC

This archive was generated by hypermail 2.3.1 : Tuesday, 1 March 2016 11:11:12 UTC