Re: UTF-8 in URIs

On 2014-01-16 10:52, Nicolas Mailhot wrote:
>
> Le Mer 15 janvier 2014 21:46, Zhong Yu a écrit :
>> Can you give an example where an intermediary benefits from decoding
>> URI octets into unicodes?
>
> Intermediaries can not perform URL-based filtering it they can not decode
> URLS reliably. Intermediaries need to normalise URLs to a single encoding
> if they log them (for debugging or policy purposes). unix-like "just a
> bunch of bytes with no encoding indication" is an i18n disaster supported
> only by users of ASCII scripts

Well, you could log what you got on the wire. It's ASCII.

> I favour making URLs UTF-8 by default in HTTP/2 (just as it was in XML,
> that's one part of the XML spec that worked very well) and require http/1
> to 2 bridges to translate to the canonical form. Helping clients push
> local 8bits encodings will just perpetuate pre-2000 legacy mess.

How do you translate a URI with unknown URI encoding to UTF-8?

> Whenever someone specifies a new better encoding it will be time for
> HTTP/3. Unicode specs are way more complex than http, changes won't happen
> quicker than http revisions.

The problem here is that HTTP URIs are octet sequences, not character 
sequences. There is no simple way to get from a) to b) without breaking 
a significant number of sites.

Best regards, Julian

Received on Thursday, 16 January 2014 10:07:16 UTC