Re: UTF-8 in URIs

With all due respect, all of the protocols we use on the Internet use octets as the basis of text strings, and in particular most strings passed over the Internet (headers, header values, URIs, hostnames, etc.) do not even need support beyond US ASCII.  This is a huge benefit for interoperability at the tiny expense of expansion for some languages.

The place where UTF-16 has the most benefit is in the content that is transferred, not in the fractional protocol overhead used to transfer that content.  (and even there, I would compare the size of your content encoded as UTF-8 and as UTF-16 before making a decision - web pages often are better off as UTF-8 due to the HTML markup, while email messages are better as UTF-16, for example...)


On Jan 17, 2014, at 7:53 AM, Zhong Yu <zhong.j.yu@gmail.com> wrote:
> ...
> An UTF-16 option would be nice. Let's be honest, UTF-8 is
> English-centric. It may be necessary to interoprate with previous
> ASCII based systems. But going forward, UTF-8 should not be favored
> just because it is the best option for the English language.
> 
> Zhong Yu
> 

_________________________________________________________
Michael Sweet, Senior Printing System Engineer, PWG Chair

Received on Friday, 17 January 2014 13:11:09 UTC