Re: consensus on :query ? from Matthew Kerwin on 2014-07-24 (ietf-http-wg@w3.org from July to September 2014)

From: Matthew Kerwin <matthew@kerwin.net.au>
Date: Thu, 24 Jul 2014 10:40:08 +1000
To: Adrien de Croy <adrien@qbik.com>
Cc: Martin Thomson <martin.thomson@gmail.com>, HTTP Working Group <ietf-http-wg@w3.org>
Message-ID: <CACweHNC9rP3i=-VxRJQkHPLjkXnjbbqMAoF8TpSWnMobrC-TdQ@mail.gmail.com>

On 24 July 2014 09:06, Adrien de Croy <adrien@qbik.com> wrote:

>
> a URI is just a construction of several components glued together with
> delimiters, e.g.
>
> ://
> @
> :
> /
> ?
> &
> #
>
> etc.
>
>
Technically "&" isn't important in a query (or elsewhere in a http URI).
It's a common convention to make the query be a sequence of
key=value&key=value pairs, but there's nothing in the spec, so we can't
actually break down the query and have it still compliant with all possible
uses.


this places constraints on the component values, since you can't use
> structural delimiters inside values.  This means if we do want to include
> such things, we have to escape them, and it snowballs from there.
>
>
The only ones we have to escape this way are the reserved characters, and
really, within a protocol like HTTP, the only ones we absolutely have to
escape are the ones that have meaning to the protocol and its applications
(e.g. "/" and "?" within the path). Although the gods know what sort of
random applications there are out that that absolutely depend on "$" being
distinguished from "%24", etc.

I could see some interest in splitting the path into either a single "*" or
a list of segments (removing "%2F" from the path string), but for one how
many URLs have %2F in the path? And for two the rules in RFC 3986 and a
couple of decades of experience already have that covered, I think.

I guess you could split the authority into userinfo/host/port if you were
desperate for such a thing.


Imagine if we just sent all individual parts of a URI in different fields,
> where we didn't need to parse them to distinguish the parts.  No more %20
> vs +, no more string escape unicode exploits.
>
>
The mapping of "+" to a space character is a de facto standard, but it's
not codified. We can't assume that all plus signs in a query string are
meant to represent space characters. And, as I said earlier, we can't
really make PHP's assumption about the query, and parse it down to a $_GET
assocarray, because there are more uses of HTTP than just PHP.


Sure we might need to aggregate things to create a cache key etc, but
> that's a safe operation.
>
>
Not if you've converted both "%20" and "+" to spaces; that's non-reversible.


-- 
  Matthew Kerwin
  http://matthew.kerwin.net.au/

Received on Thursday, 24 July 2014 00:40:37 UTC