W3C home > Mailing lists > Public > ietf-http-wg@w3.org > January to March 2014

Re: UTF-8 in URIs

From: Julian Reschke <julian.reschke@gmx.de>
Date: Thu, 16 Jan 2014 15:38:34 +0100
Message-ID: <52D7EEEA.7060108@gmx.de>
To: Michael Sweet <msweet@apple.com>
CC: Poul-Henning Kamp <phk@phk.freebsd.dk>, Zhong Yu <zhong.j.yu@gmail.com>, Gabriel Montenegro <Gabriel.Montenegro@microsoft.com>, "ietf-http-wg@w3.org" <ietf-http-wg@w3.org>, Osama Mazahir <OSAMAM@microsoft.com>, Dave Thaler <dthaler@microsoft.com>, Mike Bishop <Michael.Bishop@microsoft.com>, Matthew Cox <macox@microsoft.com>
On 2014-01-16 15:29, Michael Sweet wrote:
> Julian,
> Consider a file named "exposť.html", served by www.example.com. This URI can be encoded in many different ways depending on the character set and (for Unicode) normalization form used, for example:
>      ISO-8859-1      http://www.exmaple.com/expos%E9.html
>      UTF-8 NFD       http://www.exmaple.com/expose%CC%81.html
>      UTF-8 NFC       http://www.exmaple.com/expos%C3%A9.html

Yes. I'm painfully aware of that. It's a big issue with WebDAV as 
Windows and OsX picked different normalization forms.

> Today, you have no guarantee that typing "http://www.example.com/exposť.html" in your web browser will work since the browser's choice of character set and normalization form may not match the server's, and it may not be possible for the server to correctly guess.

The best way to fix this is to educate people to use UTF-8. One way to 
do this is to let the UAs do what they do already (default to UTF-8).

This is an evangelism issue; changing the protocol doesn't help.

> And an intermediate proxy will have difficulty efficiently/correctly caching the content as well.

It can just treat them as different URIs.

Best regards, Julian
Received on Thursday, 16 January 2014 14:39:10 UTC

This archive was generated by hypermail 2.4.0 : Friday, 17 January 2020 17:14:23 UTC