- From: Matthew Kerwin <matthew@kerwin.net.au>
- Date: Thu, 24 Jul 2014 10:40:08 +1000
- To: Adrien de Croy <adrien@qbik.com>
- Cc: Martin Thomson <martin.thomson@gmail.com>, HTTP Working Group <ietf-http-wg@w3.org>
- Message-ID: <CACweHNC9rP3i=-VxRJQkHPLjkXnjbbqMAoF8TpSWnMobrC-TdQ@mail.gmail.com>
On 24 July 2014 09:06, Adrien de Croy <adrien@qbik.com> wrote: > > a URI is just a construction of several components glued together with > delimiters, e.g. > > :// > @ > : > / > ? > & > # > > etc. > > Technically "&" isn't important in a query (or elsewhere in a http URI). It's a common convention to make the query be a sequence of key=value&key=value pairs, but there's nothing in the spec, so we can't actually break down the query and have it still compliant with all possible uses. this places constraints on the component values, since you can't use > structural delimiters inside values. This means if we do want to include > such things, we have to escape them, and it snowballs from there. > > The only ones we have to escape this way are the reserved characters, and really, within a protocol like HTTP, the only ones we absolutely have to escape are the ones that have meaning to the protocol and its applications (e.g. "/" and "?" within the path). Although the gods know what sort of random applications there are out that that absolutely depend on "$" being distinguished from "%24", etc. I could see some interest in splitting the path into either a single "*" or a list of segments (removing "%2F" from the path string), but for one how many URLs have %2F in the path? And for two the rules in RFC 3986 and a couple of decades of experience already have that covered, I think. I guess you could split the authority into userinfo/host/port if you were desperate for such a thing. Imagine if we just sent all individual parts of a URI in different fields, > where we didn't need to parse them to distinguish the parts. No more %20 > vs +, no more string escape unicode exploits. > > The mapping of "+" to a space character is a de facto standard, but it's not codified. We can't assume that all plus signs in a query string are meant to represent space characters. And, as I said earlier, we can't really make PHP's assumption about the query, and parse it down to a $_GET assocarray, because there are more uses of HTTP than just PHP. Sure we might need to aggregate things to create a cache key etc, but > that's a safe operation. > > Not if you've converted both "%20" and "+" to spaces; that's non-reversible. -- Matthew Kerwin http://matthew.kerwin.net.au/
Received on Thursday, 24 July 2014 00:40:37 UTC