- From: Julian Reschke <julian.reschke@gmx.de>
- Date: Thu, 16 Jan 2014 15:41:52 +0100
- To: Nicolas Mailhot <nicolas.mailhot@laposte.net>, Zhong Yu <zhong.j.yu@gmail.com>
- CC: Gabriel Montenegro <gabriel.montenegro@microsoft.com>, "ietf-http-wg@w3.org" <ietf-http-wg@w3.org>, Osama Mazahir <osamam@microsoft.com>, Dave Thaler <dthaler@microsoft.com>, Mike Bishop <michael.bishop@microsoft.com>, Matthew Cox <macox@microsoft.com>
On 2014-01-16 15:33, Nicolas Mailhot wrote: > > Le Jeu 16 janvier 2014 12:25, Zhong Yu a écrit : >> There is no way to enforce UTF-8 on URIs; we cannot even enforce >> %-encoding, the server can always build proprietary encoding on top of >> ASCII chars (for its own convenience, not for being cryptic to others) >> >> URIs have never been supposed to be understandable by anyone other >> than the original server. I don't see how we can change that, unless >> we turn URI into a full blow language with structures, semantics, and >> a huge vocabulary. > > Look, that is all nonsense. Um, no. > URLs are treated as text in html documents. URL are treated as text in > logs and traffic consoles. URL are treated as text by web site designers > (otherwise all accesses would be in the form mywebsite.com/opaquenumber > and how many sites actually do that?). Web traffic is not direct Yes. So? > end-to-end it goes through intermediaries that need to decode part of the > http envelope and besides web sites are more and more inter penetrated > (URL soup aka mashup and clouds) so decoding has not been a private web > site affair for a long time I still don't understand why intermediaries "need" to "decode" request URIs. > All those elements do not manipulate chains of bytes but text and the > difference between chains of bytes and text is clear encoding rules (I > know it is a huge understanding leap for most developers that didn't have > to deal extensively with encoding problem fallouts) The URI on the wire is indeed a sequence of ASCII characters (well, a legal one). The fact that non-ASCII characters and delimiters can be embedded using percent-escaping doesn't change that fact. > There is a difference between semantics (which are the business of web > sites) and technical encoding. I don't care a fig about what encoding a > web server uses on its filesystem or the encoding of web pages. What I > want is that the on-wire representation, that needs to be decoded by all > kinds of third parties for things to work smoothly, to be clearly defined > without the usual "chain of bytes" cop-out. Again: please clarify why it needs to be "decoded" by anybody except the origin server. > ... Best regards, Julian
Received on Thursday, 16 January 2014 14:42:32 UTC