- From: Roy T. Fielding <fielding@gbiv.com>
- Date: Thu, 25 Aug 2022 10:45:14 -0700
- To: Shang Ye <yesh25@mail2.sysu.edu.cn>
- Cc: uri <uri@w3.org>
> On Aug 24, 2022, at 7:31 AM, Shang Ye <yesh25@mail2.sysu.edu.cn> wrote: > > Hi all, > > It has been noted that according to Section 5 of RFC 3986, resolving the > relative reference `.///bar` against the absolute URI `foo:bar` (or `.//bar` > against `foo:/bar`) results in a URI `foo://bar`, in which the resolved path > component starts with `//` (not allowed as per RFC 3986) and effectively > becomes an authority component. This behavior has caused issues in several > implementations of RFC 3986 [1]. Those all seem to be speculative issues from the same reporter. https://datatracker.ietf.org/doc/html/rfc3986/#section-1.2.3 A relative reference (Section 4.2) refers to a resource by describing the difference within a hierarchical name space between the reference context and the target URI. The reference resolution algorithm, presented in Section 5, defines how such a reference is transformed to the target URI. As relative references can only be used within the context of a hierarchical URI, designers of new URI schemes should use a syntax consistent with the generic syntax's hierarchical components unless there are compelling reasons to forbid relative referencing within that scheme. A base URI of `foo:bar` does not use a hierarchical syntax and thus cannot be used for relative references other than same-document fragments. This does not prevent a parser from taking that base URI and any reference string, turning the crank on the input, and generating a syntactically valid URI string as a result. It only means that a preconception about what such a result is supposed to contain is not supported by the algorithms. IOW, a result that looks like an authority component is just as valid as any other result. > Prior to this report, the WHATWG URL Standard has been revised to fix a > similar issue, by prepending `/.` to the path when necessary [2]. There was > a recent attempt to fit the WHATWG solution into an RFC 3986 implementation, > but without much success due to limited applicability [3]. It appears to be a limited patch to change one meaningless result into a different meaningless result when the relative resolution algorithm is being used with a base URI that doesn't have a hierarchical syntax. That seems to be a reasonable workaround to support their specific test harness, but it's far outside the scope of the existing standard. Relative resolution is not supposed to be "idempotent". My guess is that they expect the resolved components to round-trip into the same components when the output reference is parsed again, which should be the case for all valid uses of relative references. It would also be fine to accept the output as defined by the RFC, resulting in a URI that may or may not fit within the syntax of that scheme. It is not the resolution parser's job to enforce scheme-specific syntax. In this case, it looks rather arbitrary that making the resulting path absolute ought to be preferred to letting the new string contain what looks like an authority component. It would also be fine for a future RFC to change the algorithm such that the base path is checked for an expected hierarchical syntax before attempting to merge paths, and then enumerate all of the potential ways that error can be handled, but I don't think we could require one over the others. Any choice we make here would result in most parsers being non-conformant, just to support an invalid and irrelevant use case. > Another potential issue I found is that resolving `../bar` against `foo:bar/` > gives `foo:/bar`, in which a root emerges out of nowhere. Not sure if this is > a real problem, but IMHO it may be more correct for the `remove_dot_segments` > algorithm to preserve the relativity of paths, i.e., not to output an absolute > path when the input is relative. If relativity of paths is desirable, then the base URI path has to be hierarchical according to the RFC. If it isn't, then any assumption about which should be preferred is equally wrong. > I'm not much of an expert in URIs, but I wonder if it is worth an errata > report or an update to the RFC. Any thoughts on this? > > [1] = https://github.com/lo48576/iri-string/issues/8 > ; https://github.com/sgodwincs/uriparse-rs/issues/20 > ; https://github.com/python-hyper/rfc3986/issues/85 > [2] = https://github.com/whatwg/url/pull/505 > [3] = https://github.com/lo48576/iri-string/issues/29 > > Regards, > Shang Well, it isn't an errata. This was an intentional result of the standards process, specifically because a group of people did not want relative processing to be defined for schemes that chose not to use the hierarchical syntax reserved by "/". Whether that's a good idea or not is a different issue. The current RFC correctly defines the result of relative resolution to be a string, not the set of components that happen to be in the target before the string is output. Hence, the RFC's output is as intended and there is no expectation that the result can be re-parsed into the same components. However, we could make a choice (the next time around) that all of the path processing is distinct from the other components, in which case we would need to specifically handle the case of a non-hierarchical base URI (or at least one that doesn't have an absolute or empty [abempty] path component) just to keep things from getting weird. Such choices are likely to result in unexpected consequences. Cheers, ....Roy
Received on Thursday, 25 August 2022 17:45:35 UTC