Re: UTF-8 in URIs

On 2014-01-16 15:29, Michael Sweet wrote:
> Julian,
>
> Consider a file named "exposé.html", served by www.example.com. This URI can be encoded in many different ways depending on the character set and (for Unicode) normalization form used, for example:
>
>      ISO-8859-1      http://www.exmaple.com/expos%E9.html
>
>      UTF-8 NFD       http://www.exmaple.com/expose%CC%81.html
>
>      UTF-8 NFC       http://www.exmaple.com/expos%C3%A9.html

Yes. I'm painfully aware of that. It's a big issue with WebDAV as 
Windows and OsX picked different normalization forms.

> Today, you have no guarantee that typing "http://www.example.com/exposé.html" in your web browser will work since the browser's choice of character set and normalization form may not match the server's, and it may not be possible for the server to correctly guess.

The best way to fix this is to educate people to use UTF-8. One way to 
do this is to let the UAs do what they do already (default to UTF-8).

This is an evangelism issue; changing the protocol doesn't help.

> And an intermediate proxy will have difficulty efficiently/correctly caching the content as well.

It can just treat them as different URIs.

Best regards, Julian

Received on Thursday, 16 January 2014 14:39:10 UTC