- From: Nicolas Mailhot <nicolas.mailhot@laposte.net>
- Date: Thu, 16 Jan 2014 11:24:01 +0100
- To: "Julian Reschke" <julian.reschke@gmx.de>
- Cc: "Nicolas Mailhot" <nicolas.mailhot@laposte.net>, "Zhong Yu" <zhong.j.yu@gmail.com>, "Gabriel Montenegro" <gabriel.montenegro@microsoft.com>, "ietf-http-wg@w3.org" <ietf-http-wg@w3.org>, "Osama Mazahir" <osamam@microsoft.com>, "Dave Thaler" <dthaler@microsoft.com>, "Mike Bishop" <michael.bishop@microsoft.com>, "Matthew Cox" <macox@microsoft.com>
Le Jeu 16 janvier 2014 11:06, Julian Reschke a écrit : > On 2014-01-16 10:52, Nicolas Mailhot wrote: >> >> Le Mer 15 janvier 2014 21:46, Zhong Yu a écrit : >>> Can you give an example where an intermediary benefits from decoding >>> URI octets into unicodes? >> >> Intermediaries can not perform URL-based filtering it they can not >> decode >> URLS reliably. Intermediaries need to normalise URLs to a single >> encoding >> if they log them (for debugging or policy purposes). unix-like "just a >> bunch of bytes with no encoding indication" is an i18n disaster >> supported >> only by users of ASCII scripts > > Well, you could log what you got on the wire. It's ASCII. And it's useless if you can't interpret it reliably. May as well log the output of /dev/random at the time. Don't have time to get humans comb millions of log lines to fix encoding errors. >> I favour making URLs UTF-8 by default in HTTP/2 (just as it was in XML, >> that's one part of the XML spec that worked very well) and require >> http/1 >> to 2 bridges to translate to the canonical form. Helping clients push >> local 8bits encodings will just perpetuate pre-2000 legacy mess. > > How do you translate a URI with unknown URI encoding to UTF-8? You treat it as UTF-8. If it fails UTF-8 sanity rules you fail with an error. That will make people fix their encodings quickly. >> Whenever someone specifies a new better encoding it will be time for >> HTTP/3. Unicode specs are way more complex than http, changes won't >> happen >> quicker than http revisions. > > The problem here is that HTTP URIs are octet sequences, not character > sequences. The problem is that octet sequences are useless by themselves if you can not decode them. -- Nicolas Mailhot
Received on Thursday, 16 January 2014 10:24:31 UTC