Re: Invalid Characters in URLs from Daniel Stenberg on 2024-09-20 (ietf-http-wg@w3.org from July to September 2024)

From: Daniel Stenberg <daniel@haxx.se>
Date: Fri, 20 Sep 2024 17:25:32 +0200 (CEST)
To: Tim Bray <tbray@textuality.com>
cc: HTTP Working Group <ietf-http-wg@w3.org>, Biren Roy <birenroy@google.com>, Ryan Hamilton <rch@google.com>
Message-ID: <88587s52-331n-6669-678r-p442219s1p0r@unkk.fr>

On Fri, 20 Sep 2024, Tim Bray wrote:

> AFAIK browsers, whatever WHAT iteration they follow, will correctly 
> interoperate on 3986-conforming URLs, no?  So protocol implementors should 
> expect to get good results if they stick with that?
>
> WHAT’s strength (or weakness, according to some) is its very precise 
> specification of browser behavior in the presence of various flavors of 
> error.

Then comes the humans. They will use what works. Browsers accept URLs quite 
liberally. Slowly the web become more and more WHATWG URLs that cannot be 
parsed by RFC3936 parsers. (I'm thinking spaces, braces, number of slashes, 
etc)

So, when we write a parser today, do we want to parse the URLs that are in 
active use out there, or do we want to be a purist and tell the users they are 
wrong when they provide URLs that the browsers are fine with?

And slowly things fall apart.

-- 

  / daniel.haxx.se

Received on Friday, 20 September 2024 15:25:37 UTC