W3C home > Mailing lists > Public > ietf-http-wg@w3.org > July to September 2014

Re: consensus on :query ?

From: Zhong Yu <zhong.j.yu@gmail.com>
Date: Wed, 23 Jul 2014 19:57:09 -0500
Message-ID: <CACuKZqGmXoe2ZMXMbXXwQdPWASgE+PpggP1Pd5F7Wk-1NkksaA@mail.gmail.com>
To: Matthew Kerwin <matthew@kerwin.net.au>
Cc: Adrien de Croy <adrien@qbik.com>, Martin Thomson <martin.thomson@gmail.com>, HTTP Working Group <ietf-http-wg@w3.org>
The &key=value format and the encoding of SPACE as PLUS is from HTML
standard for form submission. It's not part of HTTP or URI standards.

On Wed, Jul 23, 2014 at 7:40 PM, Matthew Kerwin <matthew@kerwin.net.au> wrote:
> On 24 July 2014 09:06, Adrien de Croy <adrien@qbik.com> wrote:
>>
>>
>> a URI is just a construction of several components glued together with
>> delimiters, e.g.
>>
>> ://
>> @
>> :
>> /
>> ?
>> &
>> #
>>
>> etc.
>>
>
> Technically "&" isn't important in a query (or elsewhere in a http URI).
> It's a common convention to make the query be a sequence of
> key=value&key=value pairs, but there's nothing in the spec, so we can't
> actually break down the query and have it still compliant with all possible
> uses.
>
>
>> this places constraints on the component values, since you can't use
>> structural delimiters inside values.  This means if we do want to include
>> such things, we have to escape them, and it snowballs from there.
>>
>
> The only ones we have to escape this way are the reserved characters, and
> really, within a protocol like HTTP, the only ones we absolutely have to
> escape are the ones that have meaning to the protocol and its applications
> (e.g. "/" and "?" within the path). Although the gods know what sort of
> random applications there are out that that absolutely depend on "$" being
> distinguished from "%24", etc.
>
> I could see some interest in splitting the path into either a single "*" or
> a list of segments (removing "%2F" from the path string), but for one how
> many URLs have %2F in the path? And for two the rules in RFC 3986 and a
> couple of decades of experience already have that covered, I think.
>
> I guess you could split the authority into userinfo/host/port if you were
> desperate for such a thing.
>
>
>> Imagine if we just sent all individual parts of a URI in different fields,
>> where we didn't need to parse them to distinguish the parts.  No more %20 vs
>> +, no more string escape unicode exploits.
>>
>
> The mapping of "+" to a space character is a de facto standard, but it's not
> codified. We can't assume that all plus signs in a query string are meant to
> represent space characters. And, as I said earlier, we can't really make
> PHP's assumption about the query, and parse it down to a $_GET assocarray,
> because there are more uses of HTTP than just PHP.
>
>
>> Sure we might need to aggregate things to create a cache key etc, but
>> that's a safe operation.
>>
>
> Not if you've converted both "%20" and "+" to spaces; that's non-reversible.
>
>
> --
>   Matthew Kerwin
>   http://matthew.kerwin.net.au/
Received on Thursday, 24 July 2014 00:57:40 UTC

This archive was generated by hypermail 2.3.1 : Wednesday, 30 March 2016 09:57:09 UTC