- From: Zhong Yu <zhong.j.yu@gmail.com>
- Date: Wed, 23 Jul 2014 19:57:09 -0500
- To: Matthew Kerwin <matthew@kerwin.net.au>
- Cc: Adrien de Croy <adrien@qbik.com>, Martin Thomson <martin.thomson@gmail.com>, HTTP Working Group <ietf-http-wg@w3.org>
The &key=value format and the encoding of SPACE as PLUS is from HTML standard for form submission. It's not part of HTTP or URI standards. On Wed, Jul 23, 2014 at 7:40 PM, Matthew Kerwin <matthew@kerwin.net.au> wrote: > On 24 July 2014 09:06, Adrien de Croy <adrien@qbik.com> wrote: >> >> >> a URI is just a construction of several components glued together with >> delimiters, e.g. >> >> :// >> @ >> : >> / >> ? >> & >> # >> >> etc. >> > > Technically "&" isn't important in a query (or elsewhere in a http URI). > It's a common convention to make the query be a sequence of > key=value&key=value pairs, but there's nothing in the spec, so we can't > actually break down the query and have it still compliant with all possible > uses. > > >> this places constraints on the component values, since you can't use >> structural delimiters inside values. This means if we do want to include >> such things, we have to escape them, and it snowballs from there. >> > > The only ones we have to escape this way are the reserved characters, and > really, within a protocol like HTTP, the only ones we absolutely have to > escape are the ones that have meaning to the protocol and its applications > (e.g. "/" and "?" within the path). Although the gods know what sort of > random applications there are out that that absolutely depend on "$" being > distinguished from "%24", etc. > > I could see some interest in splitting the path into either a single "*" or > a list of segments (removing "%2F" from the path string), but for one how > many URLs have %2F in the path? And for two the rules in RFC 3986 and a > couple of decades of experience already have that covered, I think. > > I guess you could split the authority into userinfo/host/port if you were > desperate for such a thing. > > >> Imagine if we just sent all individual parts of a URI in different fields, >> where we didn't need to parse them to distinguish the parts. No more %20 vs >> +, no more string escape unicode exploits. >> > > The mapping of "+" to a space character is a de facto standard, but it's not > codified. We can't assume that all plus signs in a query string are meant to > represent space characters. And, as I said earlier, we can't really make > PHP's assumption about the query, and parse it down to a $_GET assocarray, > because there are more uses of HTTP than just PHP. > > >> Sure we might need to aggregate things to create a cache key etc, but >> that's a safe operation. >> > > Not if you've converted both "%20" and "+" to spaces; that's non-reversible. > > > -- > Matthew Kerwin > http://matthew.kerwin.net.au/
Received on Thursday, 24 July 2014 00:57:40 UTC