W3C home > Mailing lists > Public > ietf-http-wg@w3.org > January to March 2014

Re: UTF-8 in URIs

From: Tim Bray <tbray@textuality.com>
Date: Sat, 18 Jan 2014 09:34:21 -0800
Message-ID: <CAHBU6ivDNzJsgtF1N2tnz5Xgx1=7tJVcmF_uGdwRqs8Zz3mWog@mail.gmail.com>
To: Zhong Yu <zhong.j.yu@gmail.com>
Cc: Martin J. Dürst <duerst@it.aoyama.ac.jp>, Bjoern Hoehrmann <derhoermi@gmx.net>, Gabriel Montenegro <Gabriel.Montenegro@microsoft.com>, "ietf-http-wg@w3.org" <ietf-http-wg@w3.org>, Osama Mazahir <OSAMAM@microsoft.com>, Dave Thaler <dthaler@microsoft.com>, Mike Bishop <Michael.Bishop@microsoft.com>, Matthew Cox <macox@microsoft.com>
On Fri, Jan 17, 2014 at 4:53 AM, Zhong Yu <zhong.j.yu@gmail.com> wrote:

> An UTF-16 option would be nice. Let's be honest, UTF-8 is
> English-centric. It may be necessary to interoprate with previous
> ASCII based systems. But going forward, UTF-8 should not be favored
> just because it is the best option for the English language.

UTF-16 is a horrorshow, what with its surrogates, the inability to handle
it in C code as either char* or wchar_t *, and so on.  Yes, I agree that
UTF-8 is sort of bigoted.  But it has a lot of advantages, and actually I
find it had to worry too much about the somewhat-less-than-50% overhead
(less than 50% due to ASCII markup), because when I am having trouble with
network congestion, the congestion is always due to media files... you need
a *lot* of text to match a few seconds of music or video.

Without expressing an opinion on exactly what to say about URIs, I
definitely think everything should be UTF-8 wherever possible, and note
with pleasure that the Internet is moving steadily in that direction.

> Zhong Yu
Received on Saturday, 18 January 2014 17:34:48 UTC

This archive was generated by hypermail 2.4.0 : Friday, 17 January 2020 17:14:23 UTC