- From: Larry Masinter <masinter@adobe.com>
- Date: Fri, 17 Jan 2014 07:50:40 +0000
- To: Nicolas Mailhot <nicolas.mailhot@laposte.net>, "julian.reschke@gmx.de" <julian.reschke@gmx.de>
- CC: Zhong Yu <zhong.j.yu@gmail.com>, Gabriel Montenegro <gabriel.montenegro@microsoft.com>, "ietf-http-wg@w3.org" <ietf-http-wg@w3.org>, Osama Mazahir <osamam@microsoft.com>, Dave Thaler <dthaler@microsoft.com>, Mike Bishop <michael.bishop@microsoft.com>, "Matthew Cox" <macox@microsoft.com>
It's a little hard to wade through the rhetoric ("dead trees" ?) but I don't see a problem with saying in HTTP/2.0 that a request can be an IRI or an IRI-path encoded in UTF-8. Also a "Host" header can have a UTF-8 encoded Unicode for IDN. Gateways from HTTP/1 can leave the URI path decoded; probably should not change any values. Gateways from HTTP/2 to HTTP/1 should percent-hex-encode any non-ASCII character in both host and path. This is a new HTTP/2 feature, though, and is it worth it? It's more complexity and a small saving in space, all things considered. Larry -- http://larry.masinter.net -----Original Message----- From: Nicolas Mailhot [mailto:nicolas.mailhot@laposte.net] Sent: Thursday, January 16, 2014 7:09 AM To: julian.reschke@gmx.de Cc: Nicolas Mailhot; Zhong Yu; Gabriel Montenegro; ietf-http-wg@w3.org; Osama Mazahir; Dave Thaler; Mike Bishop; Matthew Cox Subject: Re: UTF-8 in URIs Le Jeu 16 janvier 2014 15:41, Julian Reschke a écrit : > On 2014-01-16 15:33, Nicolas Mailhot wrote: >> >> Le Jeu 16 janvier 2014 12:25, Zhong Yu a écrit : >>> There is no way to enforce UTF-8 on URIs; we cannot even enforce >>> %-encoding, the server can always build proprietary encoding on top of >>> ASCII chars (for its own convenience, not for being cryptic to others) >>> >>> URIs have never been supposed to be understandable by anyone other >>> than the original server. I don't see how we can change that, unless >>> we turn URI into a full blow language with structures, semantics, and >>> a huge vocabulary. >> >> Look, that is all nonsense. > > Um, no. > >> URLs are treated as text in html documents. URL are treated as text in >> logs and traffic consoles. URL are treated as text by web site designers >> (otherwise all accesses would be in the form mywebsite.com/opaquenumber >> and how many sites actually do that?). Web traffic is not direct > > Yes. So? > >> end-to-end it goes through intermediaries that need to decode part of >> the >> http envelope and besides web sites are more and more inter penetrated >> (URL soup aka mashup and clouds) so decoding has not been a private web >> site affair for a long time > > I still don't understand why intermediaries "need" to "decode" request > URIs. Because you want to write intermediary processing rules in text form, just like server sites write their rules in text form, and the web browser user writes his request in text form, and nobody wants to write his rules in binary because the encoding of the processed objects is undefined Because traffic consoles that displays octet value chains are useless in practical terms. Because web objects are identified by urls and the identifier changing depending on random client/server encoding choices increases the complexity level way over just telling everyone "write your urls in http2 in utf-8". Because there *are* semantics in web site organisation but they are only apparent in the text encoding form the site creator used. Because all the systems that tried to jungle multiple implicit encodings instead of imposing a single rule have been pathetic failures (they "work" as long as all the actors to not use the multiple encoding freedom but add the encoding convention the designer forgot to provide) -- Nicolas Mailhot
Received on Friday, 17 January 2014 07:51:27 UTC