- From: Willy Tarreau <w@1wt.eu>
- Date: Thu, 19 Aug 2021 07:59:55 +0200
- To: HTTP Working Group <ietf-http-wg@w3.org>
Hello, after tightening up the :path parser in haproxy to strictly comply with both RFC7540 and the latest draft, one user of a large hosting platform reported breakage of at least one hosted site which contains a few HTML links with the path beginning with two slashes, resulting from the concatenation of a base URL ending with a slash and a prefix. E.g: <img src="https://site.example.org//static/image.jpg"> At first I responded "that's expected as it is explicitly forbidden by the H2 spec (RFC7540), which says": "The ":path" pseudo-header field includes the path and query parts of the target URI (the "path-absolute" production and optionally a '?' character followed by the "query" production (see Sections 3.3 and 3.4 of [RFC3986])." And RFC3986#3.3: path-absolute ; begins with "/" but not "//" path-absolute = "/" [ segment-nz *( "/" segment ) ] segment-nz = 1*pchar segment = *pchar Then I wondered why before this change the request was processed by the HTTP/1.1 backend server, had it been too lenient or was there a difference in the protocol spec. The response is the latter. In RFC7230 #2.7, a purposely different absolute-path is defined: An "absolute-path" rule is defined for protocol elements that can contain a non-empty path component. (This rule differs slightly from the path-abempty rule of RFC 3986, which allows for an empty path to be used in references, and path-absolute rule, which does not allow paths that begin with "//".) request-line = method SP request-target SP HTTP-version CRLF request-target = origin-form / absolute-form / authority-form / asterisk-form origin-form = absolute-path [ "?" query ] absolute-path = 1*( "/" segment ) And this version is the one that was adopted by the HTTP core spec, but the H2 spec keeps its difference with path-absolute that cannot start with "//", even in the latest draft. This use of "path-absolute" was introduced into the H2 spec between draft 04 and draft 05 when trying to precise the definition of :path. And I think that by then the difference between HTTP/1 and RFC3986's interpretation of path-absolute and absolute-path has simply been overlooked. Given that in the report above the browsers happily sent the request using the HTTP definition of absolute-path and not RFC3986's definition of path-absolute (thus violating RFC7540), that sites *are* written to rely on this, that this seems to be how other H2 implementations are currently handling it, and that the new HTTP spec defines the format of a request-target in origin form as an absolute-path as well, I think we should fix the latest H2 draft to adopt the common definition of absolute-path (which explicitly permits "//") and stop keeping a non-interoperable exception here. Does anyone disagree ? Thanks, Willy
Received on Thursday, 19 August 2021 06:00:11 UTC