Re: [whatwg/url] Basic URL parse requires stripping tabs before host state is entered, allowing bad hosts (Issue #829) from Andre on 2024-08-13 (public-webapps-github@w3.org from August 2024)

From: Andre <notifications@github.com>
Date: Mon, 12 Aug 2024 17:01:48 -0700
To: whatwg/url <url@noreply.github.com>
Cc: Subscribed <subscribed@noreply.github.com>
Message-ID: <whatwg/url/issues/829/2285096883@github.com>

@annevk 
> I'm not sure I understand how abc<tab>xyz.test is fed into the URL parser

If the parser would only see valid URLs, there would be no need for erroneous states, which is not the case in the real world.

Spammers and hackers are always look for ways to inject bad stuff and this one is fits well for this purpose - a preliminary scanner may not see a known spam URL because of a tab character, while the Python parser will manufacture a URL that may potentially bypass that scanner.

The point of the whole sequence described in the spec is to reject bad URLs, which in this case is circumvented by the order of operations in which tabs are stripped before they can be validated. In other words, why would the spec even identify the tab as an invalid character when a tab can never reach the host parser?

-- 
Reply to this email directly or view it on GitHub:
https://github.com/whatwg/url/issues/829#issuecomment-2285096883
You are receiving this because you are subscribed to this thread.

Message ID: <whatwg/url/issues/829/2285096883@github.com>

Received on Tuesday, 13 August 2024 00:01:52 UTC