Re: Header Serialization Discussion

Hello,
If text fields can effectively be encoded as UTF-8 would it be wise to use it to send IRIs (RFC3987)?
without punycode:
http://xn--acadmie-franaise-npb1a.fr/ vs. http://académie-française.fr/
http://www.xn--cigacz-2ib.pl/ vs. http://www.ścigacz.pl/
http://xn--rlcuo9h.xn--wkc4axeaevb3oqbg.xn--xkc2al3hye2a/ vs. http://தளமà¯.ஆளà¯à®•à®³à®®à¯ˆà®¯à®®à¯.இலஙà¯à®•à¯ˆ/
http://xn--mgbggrfi2ikdb7d.xn--mgberp4a5d4ar/ vs. http://مركزالتسجيل.السعودية/

and without percent encoding:
zdj%C4%99cia vs. zdjęcia
g%C3%B6r%C3%BCnt%C3%BC vs. görüntü

I wouldn't mind if HTTP/2 clearly took the bull by the horns regarding I18N.

The easiest way to (re)encode UTF-8 using variable code length would be to collect/define statistics only for the leading octet and store the continuation octets as fixed 6-bit values (since they are restricted to the 80-BF range, 64 values).

-- 
Frédéric Kayser

James M Snell wrote :

> Text can be either UTF-8 or ISO-8859-1, indicated by a single bit flag
> following the type code. All text strings are prefixed by it's length
> given as an unsigned variant length integer
> 
[snip]
> 
> For ISO-8859-1 Text, the Static Huffman Code used by Delta would be
> used for the value. If we can develop an approach to effectively
> handling Huffman coding for arbitrary UTF-8, then we can apply Huffman
> coding to that as well.

Received on Tuesday, 16 April 2013 02:24:10 UTC