- From: Chris Newman <Chris.Newman@innosoft.com>
- Date: Tue, 15 Apr 1997 13:33:35 -0700 (PDT)
- To: John C Klensin <klensin@mci.net>
- Cc: IETF URI list <uri@bunyip.com>
On Tue, 15 Apr 1997, John C Klensin wrote: > It would have been better had URLs been carefully and > thoughtfully internationalized from the very beginning. > For whatever reasons, they weren't. A conversion now is > going to be painful. But, if the pain is worth it, and I > suspect it might be, then let's look to a balanced, > equitable, *international* solution, not using UTF-8 > encoding in the hope that no one who uses ideographic > characters will be bothered about what happens to them. UTF-8 requires 2 octets to encode characters from the 8859-1 set which normally take 1 octet. UTF-8 requires 3 octets to encode ideographic characters from UCS-2 which normally require 2 octets. So western Europeans take a worse storage hit from UTF-8 than ideographic languages do. I'd be willing to consider an alternative proposal to hex-encoded UTF-8 in URLs, but I can't think of one that's viable in practice other than MIME encoded words (which are too disgusting to consider). I will say that it took me about 10 minutes to write a hex-encoded UTF-8 to UCS 2 converter which looked up the character descriptions in the publicly available Unicode tables.
Received on Tuesday, 15 April 1997 16:33:19 UTC