- From: Julian Reschke <julian.reschke@gmx.de>
- Date: Thu, 16 Jan 2014 11:06:45 +0100
- To: Nicolas Mailhot <nicolas.mailhot@laposte.net>, Zhong Yu <zhong.j.yu@gmail.com>
- CC: Gabriel Montenegro <gabriel.montenegro@microsoft.com>, "ietf-http-wg@w3.org" <ietf-http-wg@w3.org>, Osama Mazahir <osamam@microsoft.com>, Dave Thaler <dthaler@microsoft.com>, Mike Bishop <michael.bishop@microsoft.com>, Matthew Cox <macox@microsoft.com>
On 2014-01-16 10:52, Nicolas Mailhot wrote: > > Le Mer 15 janvier 2014 21:46, Zhong Yu a écrit : >> Can you give an example where an intermediary benefits from decoding >> URI octets into unicodes? > > Intermediaries can not perform URL-based filtering it they can not decode > URLS reliably. Intermediaries need to normalise URLs to a single encoding > if they log them (for debugging or policy purposes). unix-like "just a > bunch of bytes with no encoding indication" is an i18n disaster supported > only by users of ASCII scripts Well, you could log what you got on the wire. It's ASCII. > I favour making URLs UTF-8 by default in HTTP/2 (just as it was in XML, > that's one part of the XML spec that worked very well) and require http/1 > to 2 bridges to translate to the canonical form. Helping clients push > local 8bits encodings will just perpetuate pre-2000 legacy mess. How do you translate a URI with unknown URI encoding to UTF-8? > Whenever someone specifies a new better encoding it will be time for > HTTP/3. Unicode specs are way more complex than http, changes won't happen > quicker than http revisions. The problem here is that HTTP URIs are octet sequences, not character sequences. There is no simple way to get from a) to b) without breaking a significant number of sites. Best regards, Julian
Received on Thursday, 16 January 2014 10:07:16 UTC