- From: Nicolas Mailhot <nicolas.mailhot@laposte.net>
- Date: Thu, 16 Jan 2014 16:08:32 +0100
- To: "Julian Reschke" <julian.reschke@gmx.de>
- Cc: "Nicolas Mailhot" <nicolas.mailhot@laposte.net>, "Zhong Yu" <zhong.j.yu@gmail.com>, "Gabriel Montenegro" <gabriel.montenegro@microsoft.com>, "ietf-http-wg@w3.org" <ietf-http-wg@w3.org>, "Osama Mazahir" <osamam@microsoft.com>, "Dave Thaler" <dthaler@microsoft.com>, "Mike Bishop" <michael.bishop@microsoft.com>, "Matthew Cox" <macox@microsoft.com>
Le Jeu 16 janvier 2014 15:41, Julian Reschke a écrit : > On 2014-01-16 15:33, Nicolas Mailhot wrote: >> >> Le Jeu 16 janvier 2014 12:25, Zhong Yu a écrit : >>> There is no way to enforce UTF-8 on URIs; we cannot even enforce >>> %-encoding, the server can always build proprietary encoding on top of >>> ASCII chars (for its own convenience, not for being cryptic to others) >>> >>> URIs have never been supposed to be understandable by anyone other >>> than the original server. I don't see how we can change that, unless >>> we turn URI into a full blow language with structures, semantics, and >>> a huge vocabulary. >> >> Look, that is all nonsense. > > Um, no. > >> URLs are treated as text in html documents. URL are treated as text in >> logs and traffic consoles. URL are treated as text by web site designers >> (otherwise all accesses would be in the form mywebsite.com/opaquenumber >> and how many sites actually do that?). Web traffic is not direct > > Yes. So? > >> end-to-end it goes through intermediaries that need to decode part of >> the >> http envelope and besides web sites are more and more inter penetrated >> (URL soup aka mashup and clouds) so decoding has not been a private web >> site affair for a long time > > I still don't understand why intermediaries "need" to "decode" request > URIs. Because you want to write intermediary processing rules in text form, just like server sites write their rules in text form, and the web browser user writes his request in text form, and nobody wants to write his rules in binary because the encoding of the processed objects is undefined Because traffic consoles that displays octet value chains are useless in practical terms. Because web objects are identified by urls and the identifier changing depending on random client/server encoding choices increases the complexity level way over just telling everyone "write your urls in http2 in utf-8". Because there *are* semantics in web site organisation but they are only apparent in the text encoding form the site creator used. Because all the systems that tried to jungle multiple implicit encodings instead of imposing a single rule have been pathetic failures (they "work" as long as all the actors to not use the multiple encoding freedom but add the encoding convention the designer forgot to provide) -- Nicolas Mailhot
Received on Thursday, 16 January 2014 15:09:10 UTC