- From: Mikaël Geljić <notifications@github.com>
- Date: Wed, 08 Aug 2018 09:24:45 -0700
- To: whatwg/url <url@noreply.github.com>
- Cc: Subscribed <subscribed@noreply.github.com>
- Message-ID: <whatwg/url/issues/411@github.com>
Hi, I've been scratching my head trying to figure out where do square brackets (e.g. U+005B) get exempted from [percent-encoding](https://url.spec.whatwg.org/#percent-encode), in the [query state](https://url.spec.whatwg.org/#query-state). To compare with the [path state](https://url.spec.whatwg.org/#path-state): > 3. [UTF-8 percent encode](https://url.spec.whatwg.org/#utf-8-percent-encode) [c](https://url.spec.whatwg.org/#c) using the [path percent-encode set](https://url.spec.whatwg.org/#path-percent-encode-set), and append the result to _buffer_. I understand that—since U+005B ([) is not present in that set—it is returned _as is_ (as per [UTF-8 percent encode](https://url.spec.whatwg.org/#utf-8-percent-encode)). In the [query state](https://url.spec.whatwg.org/#query-state), encoding is expressed on a _byte_ basis: > 1. If one of the following is true > * _byte_ is less than 0x21 (!) > * _byte_ is greater than 0x7E (~) > * _byte_ is 0x22 ("), 0x23 (#), 0x3C (<), or 0x3E (>) > * _byte_ is 0x27 (') and url is special > > then append _byte_, [percent encoded](https://url.spec.whatwg.org/#percent-encode), to _url_’s [query](https://url.spec.whatwg.org/#concept-url-query). > > 2. Otherwise, append a code point whose value is _byte_ to _url_’s [query](https://url.spec.whatwg.org/#concept-url-query). Wouldn't it be equivalent to express a query percent-encode set as follows? That would make it much less convoluted imho. > 💡 The **query percent-encode set** is the [C0 control percent-encode set](https://url.spec.whatwg.org/#c0-control-percent-encode-set) and U+0020 SPACE, U+0022 ("), U+0023 (#), U+003C (<), and U+003E (>). Only the [special](https://url.spec.whatwg.org/#is-special) case about U+0027 (') would have to remain, not sure what to do about that. Any thoughts? Secondary question: in various states there is early validation: > If [c](https://url.spec.whatwg.org/#c) is not a [URL code point](https://url.spec.whatwg.org/#url-code-points) and not U+0025 (%), [validation error](https://url.spec.whatwg.org/#validation-error). U+005B ([) doesn't fall into the definition of [URL code points](https://url.spec.whatwg.org/#url-code-points). Does this mean that URLs containing such character in path/query components are considered invalid, despite being commonly accepted and parsing successfully? Cheers, Mika -- You are receiving this because you are subscribed to this thread. Reply to this email directly or view it on GitHub: https://github.com/whatwg/url/issues/411
Received on Wednesday, 8 August 2018 16:25:08 UTC