[whatwg/url] Clarify query state percent-encoding (parsing) (#411)

Hi,

I've been scratching my head trying to figure out where do square brackets (e.g. U+005B) get exempted from [percent-encoding](https://url.spec.whatwg.org/#percent-encode), in the [query state](https://url.spec.whatwg.org/#query-state).

To compare with the [path state](https://url.spec.whatwg.org/#path-state):

> 3. [UTF-8 percent encode](https://url.spec.whatwg.org/#utf-8-percent-encode) [c](https://url.spec.whatwg.org/#c) using the [path percent-encode set](https://url.spec.whatwg.org/#path-percent-encode-set), and append the result to _buffer_.

I understand that—since U+005B ([) is not present in that set—it is returned _as is_ (as per [UTF-8 percent encode](https://url.spec.whatwg.org/#utf-8-percent-encode)). 

In the [query state](https://url.spec.whatwg.org/#query-state), encoding is expressed on a _byte_ basis:

> 1. If one of the following is true
>     * _byte_ is less than 0x21 (!)
>     * _byte_ is greater than 0x7E (~)
>     * _byte_ is 0x22 ("), 0x23 (#), 0x3C (<), or 0x3E (>)
>     * _byte_ is 0x27 (') and url is special
>
>     then append _byte_, [percent encoded](https://url.spec.whatwg.org/#percent-encode), to _url_’s [query](https://url.spec.whatwg.org/#concept-url-query).
> 
> 2. Otherwise, append a code point whose value is _byte_ to _url_’s [query](https://url.spec.whatwg.org/#concept-url-query).

Wouldn't it be equivalent to express a query percent-encode set as follows? That would make it much less convoluted imho.

> 💡 The **query percent-encode set** is the [C0 control percent-encode set](https://url.spec.whatwg.org/#c0-control-percent-encode-set) and U+0020 SPACE, U+0022 ("), U+0023 (#), U+003C (<), and U+003E (>).

Only the [special](https://url.spec.whatwg.org/#is-special) case about U+0027 (') would have to remain, not sure what to do about that.

Any thoughts?

Secondary question: in various states there is early validation:

> If [c](https://url.spec.whatwg.org/#c) is not a [URL code point](https://url.spec.whatwg.org/#url-code-points) and not U+0025 (%), [validation error](https://url.spec.whatwg.org/#validation-error).

U+005B ([) doesn't fall into the definition of [URL code points](https://url.spec.whatwg.org/#url-code-points). Does this mean that URLs containing such character in path/query components are considered invalid, despite being commonly accepted and parsing successfully?

Cheers,
Mika

-- 
You are receiving this because you are subscribed to this thread.
Reply to this email directly or view it on GitHub:
https://github.com/whatwg/url/issues/411

Received on Wednesday, 8 August 2018 16:25:08 UTC