Re: Invalid Characters in URLs

On 2024-09-20, at 17:25, Daniel Stenberg <daniel@haxx.se> wrote:
> 
> On Fri, 20 Sep 2024, Tim Bray wrote:
> 
>> AFAIK browsers, whatever WHAT iteration they follow, will correctly interoperate on 3986-conforming URLs, no?  So protocol implementors should expect to get good results if they stick with that?
>> 
>> WHAT’s strength (or weakness, according to some) is its very precise specification of browser behavior in the presence of various flavors of error.
> 
> Then comes the humans. They will use what works. Browsers accept URLs quite liberally.

Yes, and there is nothing wrong with you writing a tool that accepts WHATWG-URLs.

But please read RFC 9413 for a wider perspective.

> Slowly the web become more and more WHATWG URLs that cannot be parsed by RFC3936 parsers. (I'm thinking spaces, braces, number of slashes, etc)

s/the web/the browser web/g
FTFY

> So, when we write a parser today, do we want to parse the URLs that are in active use out there, or do we want to be a purist and tell the users they are wrong when they provide URLs that the browsers are fine with?

Again, you can write a tool that happily accepts http:\\ “URLs” etc.
But you can’t impose that lenience on other tools, and we are not obliged to spend the same amount of energy in our tools that a browser does on assigning interpretations to invalid URIs.
One can think a lot of things about STD 66 (RFC 3986), but it is a useful bulwark against this kind of deterioration.

> And slowly things fall apart.

That is exactly the phenomenon that is called “Protocol Decay” in RFC 9413.

Grüße, Carsten

Received on Friday, 20 September 2024 16:22:30 UTC