- From: Mikaël Geljić <notifications@github.com>
- Date: Wed, 08 Aug 2018 09:24:45 -0700
- To: whatwg/url <url@noreply.github.com>
- Cc: Subscribed <subscribed@noreply.github.com>
- Message-ID: <whatwg/url/issues/411@github.com>
Hi,
I've been scratching my head trying to figure out where do square brackets (e.g. U+005B) get exempted from [percent-encoding](https://url.spec.whatwg.org/#percent-encode), in the [query state](https://url.spec.whatwg.org/#query-state).
To compare with the [path state](https://url.spec.whatwg.org/#path-state):
> 3. [UTF-8 percent encode](https://url.spec.whatwg.org/#utf-8-percent-encode) [c](https://url.spec.whatwg.org/#c) using the [path percent-encode set](https://url.spec.whatwg.org/#path-percent-encode-set), and append the result to _buffer_.
I understand that—since U+005B ([) is not present in that set—it is returned _as is_ (as per [UTF-8 percent encode](https://url.spec.whatwg.org/#utf-8-percent-encode)).
In the [query state](https://url.spec.whatwg.org/#query-state), encoding is expressed on a _byte_ basis:
> 1. If one of the following is true
> * _byte_ is less than 0x21 (!)
> * _byte_ is greater than 0x7E (~)
> * _byte_ is 0x22 ("), 0x23 (#), 0x3C (<), or 0x3E (>)
> * _byte_ is 0x27 (') and url is special
>
> then append _byte_, [percent encoded](https://url.spec.whatwg.org/#percent-encode), to _url_’s [query](https://url.spec.whatwg.org/#concept-url-query).
>
> 2. Otherwise, append a code point whose value is _byte_ to _url_’s [query](https://url.spec.whatwg.org/#concept-url-query).
Wouldn't it be equivalent to express a query percent-encode set as follows? That would make it much less convoluted imho.
> 💡 The **query percent-encode set** is the [C0 control percent-encode set](https://url.spec.whatwg.org/#c0-control-percent-encode-set) and U+0020 SPACE, U+0022 ("), U+0023 (#), U+003C (<), and U+003E (>).
Only the [special](https://url.spec.whatwg.org/#is-special) case about U+0027 (') would have to remain, not sure what to do about that.
Any thoughts?
Secondary question: in various states there is early validation:
> If [c](https://url.spec.whatwg.org/#c) is not a [URL code point](https://url.spec.whatwg.org/#url-code-points) and not U+0025 (%), [validation error](https://url.spec.whatwg.org/#validation-error).
U+005B ([) doesn't fall into the definition of [URL code points](https://url.spec.whatwg.org/#url-code-points). Does this mean that URLs containing such character in path/query components are considered invalid, despite being commonly accepted and parsing successfully?
Cheers,
Mika
--
You are receiving this because you are subscribed to this thread.
Reply to this email directly or view it on GitHub:
https://github.com/whatwg/url/issues/411
Received on Wednesday, 8 August 2018 16:25:08 UTC