Re: Interpreting "+" in query component of https:// scheme URIs.

On 10 September 2016 at 08:35, Matt Randall <matthew.a.randall@gmail.com>
wrote:

> Hopefully this is a quick question with a straightforward answer.  The
> https URI scheme (RFC7230) denotes that it simply follows the definition of
> the query component from the base URI RFC (RFC3986).  Query seems to allow
> for all reserved and unreserved characters (with some caveats around "?"
> and "/") in the value, and reserves none of the reserved characters as
> delimiters.
>
> From purely a specifications perspective, my assumption (absent de-facto
> legacy behaviors of certain clients and www-form-urlencoded query string
> behaviors) would be to treat the plus sign literally, just as if I would in
> the path component.  Would this be a correct interpretation given the
> following statement in section 2.2?:
>
> If a reserved character is found in a URI component and
> no delimiting role is known for that character, then it must be
> interpreted as representing the data octet corresponding to that
> character's encoding in US-ASCII.
>
> I couldn't find anything in the current specifications that would indicate that "+" has a
> defined delimiting role for the https:// URI scheme.
>
> Thank you in advance,
>
> Matt Randall
>
>
> ​From [1]:

   ... other
   subcomponents may be defined by a URI scheme's specification, or *by*
*   the implementation-specific syntax of a URI's dereferencing*
*   algorithm*, provided that such subcomponents are delimited by
   characters in the reserved set allowed within that component.

The plus sign is used in application/x-www-form-urlencoded data[2][3],
which -- by design -- can be used directly in the query component of a
URI.  So if your application follows the HTML specs, it falls under the
implementation-specific category, so "+" is treated according to its
reserved sub-delim status, and so is different from, say, "%2B".

And if you don't care about HTML, then yeah, it's just a plus sign.

It also depends what you're doing; if you're writing a HTTP middleware then
sure, ignore the plus sign (the higher-level application will deal with
it.) If you're writing a cache, then you have choices to make.

Unless I've misunderstood something.

Cheers

[1]: https://tools.ietf.org/html/rfc3986#section-2.2
[2]: https://www.w3.org/TR/html401/interact/forms.html#h-17.13.4.1
[3]: https://www.w3.org/TR/html5/forms.html#url-encoded-form-data
-- 
  Matthew Kerwin
  http://matthew.kerwin.net.au/

Received on Wednesday, 14 September 2016 10:21:05 UTC