[whatwg/url] Express validity, error correction and percent encode sets in a single table (Issue #855)

alwinb created an issue (whatwg/url#855)

### What is the issue with the URL Standard?

This is a proposal to include a table, either as clarification,
or (my preference) even as a full replacement for describing:

* Percent encode sets
* Valid vs invalid individual code points per component, and
* Error correction behaviour of the above,

Within a single small-ish table.

For each component of an URL that contains a percent encoded string,
we can describe _per codepoint_ its validity, error correction and encoding. 

A single code point is either:

- v: Valid and included verbatim in the output URL.
- E: (Escape) valid but nonetheless percent encoded.
- T: (Tolerate) invalid, but nonetheless left untouched by the parser —resulting in an invalid URL as output.
- F: (Fixed) invalid and fixed by the parser (and setters) by percent encoding the occurrence.
- R: (Reject) Invalid and causing a hard error, so that they do not end up in output URLs.

<img width="603" alt="Image" src="https://github.com/user-attachments/assets/31303696-587a-4aa1-a726-c34c01dc753a" />

Notes:
- 'Other control' here is control-c0 ∪ del-c1 ∪ surrogate ∪ non-char
- The apostrophe in the query is special cased for 'non-special' URLs where it is left untouched (ie. v: Valid) hence the superscript. Special query could also be broken out into a separate column.

(If there have been changes to these sets in the last year or so, the table might be slightly out of date)

-- 
Reply to this email directly or view it on GitHub:
https://github.com/whatwg/url/issues/855
You are receiving this because you are subscribed to this thread.

Message ID: <whatwg/url/issues/855@github.com>

Received on Saturday, 15 February 2025 07:51:34 UTC