Re: UTF-8 in URIs from Julian Reschke on 2014-01-16 (ietf-http-wg@w3.org from January to March 2014)

From: Julian Reschke <julian.reschke@gmx.de>
Date: Thu, 16 Jan 2014 11:06:45 +0100
To: Nicolas Mailhot <nicolas.mailhot@laposte.net>, Zhong Yu <zhong.j.yu@gmail.com>
CC: Gabriel Montenegro <gabriel.montenegro@microsoft.com>, "ietf-http-wg@w3.org" <ietf-http-wg@w3.org>, Osama Mazahir <osamam@microsoft.com>, Dave Thaler <dthaler@microsoft.com>, Mike Bishop <michael.bishop@microsoft.com>, Matthew Cox <macox@microsoft.com>
Message-ID: <52D7AF35.4010401@gmx.de>

On 2014-01-16 10:52, Nicolas Mailhot wrote:
>
> Le Mer 15 janvier 2014 21:46, Zhong Yu a écrit :
>> Can you give an example where an intermediary benefits from decoding
>> URI octets into unicodes?
>
> Intermediaries can not perform URL-based filtering it they can not decode
> URLS reliably. Intermediaries need to normalise URLs to a single encoding
> if they log them (for debugging or policy purposes). unix-like "just a
> bunch of bytes with no encoding indication" is an i18n disaster supported
> only by users of ASCII scripts

Well, you could log what you got on the wire. It's ASCII.

> I favour making URLs UTF-8 by default in HTTP/2 (just as it was in XML,
> that's one part of the XML spec that worked very well) and require http/1
> to 2 bridges to translate to the canonical form. Helping clients push
> local 8bits encodings will just perpetuate pre-2000 legacy mess.

How do you translate a URI with unknown URI encoding to UTF-8?

> Whenever someone specifies a new better encoding it will be time for
> HTTP/3. Unicode specs are way more complex than http, changes won't happen
> quicker than http revisions.

The problem here is that HTTP URIs are octet sequences, not character 
sequences. There is no simple way to get from a) to b) without breaking 
a significant number of sites.

Best regards, Julian

Received on Thursday, 16 January 2014 10:07:16 UTC