Re: [whatwg/url] Record whether the URL parser removed newlines. (#284)

> I think that mitigation of an HTML injection attack should go in the HTML spec.

Let's say that we go this route, and alter https://html.spec.whatwg.org/#resolving-urls to scan the input string for characters we don't like. That seems fine in itself, and we could add a note to implementers about doing this work in parallel with whitespace removal if they feel like it (because, as we've both noted, scanning through every URL string is expensive). We could then look at the scheme of the resulting URL record and abort with a parse error for those URLs containing both character types and HTTP(S) schemes.

I think that would be somewhat equivalent to the current set of patches in the main set of cases that I care about (with the caveats that the errors would look different: `TypeError` in a few places as opposed to network errors), but has the strange side effect that APIs begin to behave differently depending on where they're defined. APIs like `Worker` are in HTML, and use HTML's "parse a URL" wrapper, so they'd exhibit the new behavior. APIs like `fetch()` and XHR do not use HTML's wrapper, so they wouldn't.

And actually, looking at HTML again, not everything there uses the wrapper: `WebSocket` uses the URL parser directly, for example. So I suppose we'd need to audit all the parsing usage to see what we think the behavior ought to be (and ensure that we do the same as new parsing usage is added). I think that would leave us with a fairly confusing story for developers.

-- 
You are receiving this because you are subscribed to this thread.
Reply to this email directly or view it on GitHub:
https://github.com/whatwg/url/pull/284#issuecomment-304122278

Received on Thursday, 25 May 2017 20:57:41 UTC