- From: Frédéric Kayser <f.kayser@free.fr>
- Date: Tue, 16 Apr 2013 04:23:41 +0200
- To: ietf-http-wg@w3.org
- Message-Id: <24370F45-C4B4-41A2-8515-5B239766A943@free.fr>
Hello, If text fields can effectively be encoded as UTF-8 would it be wise to use it to send IRIs (RFC3987)? without punycode: http://xn--acadmie-franaise-npb1a.fr/ vs. http://académie-française.fr/ http://www.xn--cigacz-2ib.pl/ vs. http://www.Å›cigacz.pl/ http://xn--rlcuo9h.xn--wkc4axeaevb3oqbg.xn--xkc2al3hye2a/ vs. http://தளமà¯.ஆளà¯à®•à®³à®®à¯ˆà®¯à®®à¯.இலஙà¯à®•à¯ˆ/ http://xn--mgbggrfi2ikdb7d.xn--mgberp4a5d4ar/ vs. http://مركزالتسجيل.السعودية/ and without percent encoding: zdj%C4%99cia vs. zdjÄ™cia g%C3%B6r%C3%BCnt%C3%BC vs. görüntü I wouldn't mind if HTTP/2 clearly took the bull by the horns regarding I18N. The easiest way to (re)encode UTF-8 using variable code length would be to collect/define statistics only for the leading octet and store the continuation octets as fixed 6-bit values (since they are restricted to the 80-BF range, 64 values). -- Frédéric Kayser James M Snell wrote : > Text can be either UTF-8 or ISO-8859-1, indicated by a single bit flag > following the type code. All text strings are prefixed by it's length > given as an unsigned variant length integer > [snip] > > For ISO-8859-1 Text, the Static Huffman Code used by Delta would be > used for the value. If we can develop an approach to effectively > handling Huffman coding for arbitrary UTF-8, then we can apply Huffman > coding to that as well.
Received on Tuesday, 16 April 2013 02:24:10 UTC