Re: Invalid Characters in URLs from David Benjamin on 2024-09-25 (ietf-http-wg@w3.org from July to September 2024)

From: David Benjamin <davidben@chromium.org>
Date: Wed, 25 Sep 2024 12:33:09 -0400
To: Greg Wilkins <gregw@webtide.com>
Cc: Carsten Bormann <cabo@tzi.org>, Daniel Stenberg <daniel@haxx.se>, Tim Bray <tbray@textuality.com>, HTTP Working Group <ietf-http-wg@w3.org>, Biren Roy <birenroy@google.com>, Ryan Hamilton <rch@google.com>
Message-ID: <CAF8qwaAwayn-MmS5Xmt58a2eEjbbU72Xvy=JXknMrGQmNQ_mPg@mail.gmail.com>

Alas, while I can't speak to your particular CVEs, it is sadly true that
when two implementations both accept some input, but interpret it in
different ways, there is often a security consequence, particularly when it
comes to URLs. :-(

I hope therefore it is uncontroversial to say that *if* two
implementations, for whatever reason, find they both need to accept things
outside of the RFC 3986, they have some way to agree on what those things
mean. I hope it's also uncontroversial to say that, for better or worse,
there are real-world compatibility needs that *some* implementations indeed
need to accept things outside of RFC 3986. (There are also implementations
that do not need this! I'm just talking about the ones that do.)

It follows then that *someone* should write down the de facto lenient
flavor of URL parsing, so we can collectively avoid the security
consequences. The status quo is that this someone is the WHATWG URL
standard.

That status quo means systems that, for better or worse, need the lenient
flavor indeed should consult the WHATWG document and ignore RFC 3986.
Sounds like you and Ryan fall under that category. It also means HTTP lives
with an interesting matrix of compatibility pressures between strict URL
consumers, lenient URL consumers, and URL producers. It also means that
someone implementing HTTP from the RFCs may be missing context they care
about, because there's no pointer from RFC 9110 to the WHATWG URL standard.

If the folks are happy with that status quo, great. If folks are unhappy
with that status quo, there's probably room for some work here, possibly
starting by reaching out to WHATWG folks. Either way, I think that is the
decision tree here.

David

On Fri, Sep 20, 2024 at 7:03 PM Greg Wilkins <gregw@webtide.com> wrote:

>
>
> On Sat, 21 Sept 2024 at 02:24, Carsten Bormann <cabo@tzi.org> wrote:
>
>> On 2024-09-20, at 17:25, Daniel Stenberg <daniel@haxx.se> wrote:
>> >
>> > On Fri, 20 Sep 2024, Tim Bray wrote:
>> >
>> > So, when we write a parser today, do we want to parse the URLs that are
>> in active use out there, or do we want to be a purist and tell the users
>> they are wrong when they provide URLs that the browsers are fine with?
>>
>> Again, you can write a tool that happily accepts http:\\ “URLs” etc.
>> But you can’t impose that lenience on other tools, and we are not obliged
>> to spend the same amount of energy in our tools that a browser does on
>> assigning interpretations to invalid URIs.
>
>
> As a server developer, a situation that is happening with increasing
> frequency is that we receive CVE's against our server because we have a
> different interpretation of URIs than the common browsers.       We
> implement the RFC as written, but the browsers mostly follow WhatWG, so
> there are differences, especially around invalid or deprecated aspects of
> URIs (e.g. an authority with user info).    If a server interprets a URI
> flexibly and differently to the browsers, then security researchers ping
> you with potential security bypass vulnerabilities.
>
> We are thus forced to either ignore the RFC and adopt WhatWGs spec, or to
> just strictly follow the RFC and show no flexibility at all (400 bad
> requests for the slightest violation).
>
> So I do not think leniency is a way out of this mess.
>
> cheers
>
> --
> Greg Wilkins <gregw@webtide.com> CTO http://webtide.com
>

Received on Wednesday, 25 September 2024 16:33:31 UTC