Re: [whatwg/url] Parser generates invalid URLs (#379)

> I think that is already documented quite well through some of the examples right at the top of the URLs section (see also #595): https://url.spec.whatwg.org/#example-url-parsing

The valid column there explains when the input is valid. It doesn't explain when the output is invalid.

---

I still think my comment from https://github.com/whatwg/url/issues/379#issuecomment-380327563 applies. Let me try rephrasing it.

- If the idea of "valid URL string" is to have any meaning at all, separate from "string that is parseable as a URL", I think it means "software and humans should try to produce valid URL strings for exchange with other software and with humans, and not produce invalid URL strings".
- In particular, I don't think web browsers should send invalid URL strings to servers.
- Currently, it is very possible for web browsers to send invalid URL strings to servers. An example is when the user clicks on `<a href="https://example.com/?}">click me</a>`.
- I also think it's bad that, if we want to encourage people to use the string `"https://example.com/?%7D"` instead of `"https://example.com/?}"`, we have not given people an easy tool for producing the former string from, e.g., user input, or database entries, or similar. We have only given them the `(new URL(input)).href` tool, which produces the latter string.
  - This is very different than HTML, where in most cases, a parse-serialize roundtrip produces valid HTML. E.g. if you input `<i><b>misnested</i></b>` (invalid) you'll get back `<i><b>misnested</b></i>`.
  - Or if you input `<i a=1 a=2>x</i>` you'll get back `<i a="1">x</i>`, which I guess is still technically not valid because there's no `a=""` attribute on the `<i>` element, but the "obvious" "syntactic" errors have been fixed.
- Thus, I think it doesn't make sense to call `"https://example.com/?}"` an invalid URL string. I cannot see what purpose it serves to have tools call out such strings as invalid, if we have lots of them flying around the web all the time between all sorts of software, and we have provided no tools or algorithm for creating valid URLs.

---

I don't think any of this detracts from the desire to declare URL parsing solved, or the increasing progress on aligning parsers. From my perspective, this is purely about changing the definition of validity.

-- 
Reply to this email directly or view it on GitHub:
https://github.com/whatwg/url/issues/379#issuecomment-2513839193
You are receiving this because you are subscribed to this thread.

Message ID: <whatwg/url/issues/379/2513839193@github.com>

Received on Tuesday, 3 December 2024 08:22:30 UTC