- From: Matthew Kerwin <matthew@kerwin.net.au>
- Date: Wed, 14 Sep 2016 20:12:17 +1000
- To: Matt Randall <matthew.a.randall@gmail.com>
- Cc: "ietf-http-wg@w3.org" <ietf-http-wg@w3.org>
- Message-ID: <CACweHNB15af2NUr4kNCqzsdxE+B8CAyhy7g10_GvdBpd-pjVWw@mail.gmail.com>
On 10 September 2016 at 08:35, Matt Randall <matthew.a.randall@gmail.com> wrote: > Hopefully this is a quick question with a straightforward answer. The > https URI scheme (RFC7230) denotes that it simply follows the definition of > the query component from the base URI RFC (RFC3986). Query seems to allow > for all reserved and unreserved characters (with some caveats around "?" > and "/") in the value, and reserves none of the reserved characters as > delimiters. > > From purely a specifications perspective, my assumption (absent de-facto > legacy behaviors of certain clients and www-form-urlencoded query string > behaviors) would be to treat the plus sign literally, just as if I would in > the path component. Would this be a correct interpretation given the > following statement in section 2.2?: > > If a reserved character is found in a URI component and > no delimiting role is known for that character, then it must be > interpreted as representing the data octet corresponding to that > character's encoding in US-ASCII. > > I couldn't find anything in the current specifications that would indicate that "+" has a > defined delimiting role for the https:// URI scheme. > > Thank you in advance, > > Matt Randall > > > ​From [1]: ... other subcomponents may be defined by a URI scheme's specification, or *by* * the implementation-specific syntax of a URI's dereferencing* * algorithm*, provided that such subcomponents are delimited by characters in the reserved set allowed within that component. The plus sign is used in application/x-www-form-urlencoded data[2][3], which -- by design -- can be used directly in the query component of a URI. So if your application follows the HTML specs, it falls under the implementation-specific category, so "+" is treated according to its reserved sub-delim status, and so is different from, say, "%2B". And if you don't care about HTML, then yeah, it's just a plus sign. It also depends what you're doing; if you're writing a HTTP middleware then sure, ignore the plus sign (the higher-level application will deal with it.) If you're writing a cache, then you have choices to make. Unless I've misunderstood something. Cheers [1]: https://tools.ietf.org/html/rfc3986#section-2.2 [2]: https://www.w3.org/TR/html401/interact/forms.html#h-17.13.4.1 [3]: https://www.w3.org/TR/html5/forms.html#url-encoded-form-data -- Matthew Kerwin http://matthew.kerwin.net.au/
Received on Wednesday, 14 September 2016 10:21:05 UTC