- From: Martin J. Dürst <duerst@it.aoyama.ac.jp>
- Date: Sun, 10 Feb 2013 14:02:46 +0900
- To: Willy Tarreau <w@1wt.eu>
- CC: Mark Nottingham <mnot@mnot.net>, James M Snell <jasnell@gmail.com>, "ietf-http-wg@w3.org" <ietf-http-wg@w3.org>
Hello Willy, On 2013/02/09 22:33, Willy Tarreau wrote: > On Sat, Feb 09, 2013 at 09:36:57PM +0900, "Martin J. Dürst" wrote: >> It would be a good idea to try hard to make the new protocol forward >> looking (or actually just acknowledge the present, rather than stay >> frozen in the past) for the next 20 years or so in terms of character >> encoding, too, and not only in terms of CPU/network performance. > > Well, don't confuse UTF-8 and Unicode. As the main author of http://www.w3.org/TR/charmod/, I sure won't. > UTF-8 is just a space-efficient way > of transporting Unicode characters for western countries. And for transporting ASCII-based commands/headers/markup together with non-ASCII data. That's the main reason the IETF adopted it. > The encoding can > become inefficient to transport for other charsets by inflating data by up > to 50% Well, that's actually an urban myth. The 50% is for CJK (Chinese/Japanese/Korean). For the languages/scripts of India, South East Asia, and a few more places, it can be 200%. (For texts purely in an alphabet in the Supplemental planes such as Old Italic, Shavian, Osmanya,..., it can be 300%, but I guess we can ignore these.) But these numbers only apply to cases that don't contain any ASCII at all. > and may make compression less efficient. That depends very much on the method of compression that's used. > Also, processing it is > particularly inefficient as you have to parse each and every byte to find > a length, making string comparisons quite slow. [See the follow-up mails in this thread.] > I'm not saying I'm totally against UTF-8 in HTTP/2 (eventhough I hate using > it), I'm saying that it's not *THE* solution to every problem. It's just *A* > solution to *A* problem : "how to extend character sets in existing documents > without having to re-encode them all". I don't think this specific problem is > related to the scope of the HTTP/2 work, so at first glance, I'd say that > UTF-8 doesn't seem to solve a known problem here. The fact that I mentioned Websockets may have lead to a misunderstanding. I'm not proposing to use UTF-8 only in bodies, just in headers (I wouldn't object, though). My understanding was that James was talking about headers, and I was doing so, too. Regards, Martin.
Received on Sunday, 10 February 2013 05:03:21 UTC