Incompleteness of reference resolution algorithm in RFC 3986

Hi all,

It has been noted that according to Section 5 of RFC 3986, resolving the
relative reference `.///bar` against the absolute URI `foo:bar` (or `.//bar`
against `foo:/bar`) results in a URI `foo://bar`, in which the resolved path
component starts with `//` (not allowed as per RFC 3986) and effectively
becomes an authority component. This behavior has caused issues in several
implementations of RFC 3986 [1].

Prior to this report, the WHATWG URL Standard has been revised to fix a
similar issue, by prepending `/.` to the path when necessary [2]. There was
a recent attempt to fit the WHATWG solution into an RFC 3986 implementation,
but without much success due to limited applicability [3].

Another potential issue I found is that resolving `../bar` against `foo:bar/`
gives `foo:/bar`, in which a root emerges out of nowhere. Not sure if this is
a real problem, but IMHO it may be more correct for the `remove_dot_segments`
algorithm to preserve the relativity of paths, i.e., not to output an absolute
path when the input is relative.

I'm not much of an expert in URIs, but I wonder if it is worth an errata
report or an update to the RFC. Any thoughts on this?

[1] = https://github.com/lo48576/iri-string/issues/8

    ; https://github.com/sgodwincs/uriparse-rs/issues/20

    ; https://github.com/python-hyper/rfc3986/issues/85

[2] = https://github.com/whatwg/url/pull/505

[3] = https://github.com/lo48576/iri-string/issues/29


Regards,
Shang

Received on Wednesday, 24 August 2022 20:51:11 UTC