Re: [whatwg/url] Basic URL parse requires stripping tabs before host state is entered, allowing bad hosts (Issue #829)

> I'm not sure what you mean by differently?

I was referring to how browsers handle spaces differently from tabs and that Firefox won't follow the link with spaces and other browsers would encode them as `%20`, but re-reading the spec, I see that I missed the fact that space is not included in this step, so it's irrelevant. My apologies for misinterpreting this step. Please disregard my point about browsers.

Returning back to the original point about about tabs in domains, let me ask you this question. Given items 2 and 3 in the basic URL parser:

> If input contains any [ASCII tab or newline](https://infra.spec.whatwg.org/#ascii-tab-or-newline), [invalid-URL-unit](https://url.spec.whatwg.org/#invalid-url-unit) [validation error](https://url.spec.whatwg.org/#validation-error).

> Remove all [ASCII tab or newline](https://infra.spec.whatwg.org/#ascii-tab-or-newline) from input.

I read these two steps as that conforming parsers should (or are encouraged to) provide an indicator of a validation error if a tab or a space (or other non-URL-code-point) are encountered.

Would this in turn suggest that parsers, like the one in Python, should provide some way to request a validation failure when any of those characters are encountered, or at least provide some feedback in that the returned parsed URL components may have been modified because of this validation failure?

-- 
Reply to this email directly or view it on GitHub:
https://github.com/whatwg/url/issues/829#issuecomment-2293628200
You are receiving this because you are subscribed to this thread.

Message ID: <whatwg/url/issues/829/2293628200@github.com>

Received on Friday, 16 August 2024 14:36:10 UTC