- From: Willy Tarreau <w@1wt.eu>
- Date: Thu, 19 Aug 2021 07:59:55 +0200
- To: HTTP Working Group <ietf-http-wg@w3.org>
Hello,
after tightening up the :path parser in haproxy to strictly comply with
both RFC7540 and the latest draft, one user of a large hosting platform
reported breakage of at least one hosted site which contains a few HTML
links with the path beginning with two slashes, resulting from the
concatenation of a base URL ending with a slash and a prefix. E.g:
<img src="https://site.example.org//static/image.jpg">
At first I responded "that's expected as it is explicitly forbidden by
the H2 spec (RFC7540), which says":
"The ":path" pseudo-header field includes the path and query parts
of the target URI (the "path-absolute" production and optionally a
'?' character followed by the "query" production (see Sections 3.3
and 3.4 of [RFC3986])."
And RFC3986#3.3:
path-absolute ; begins with "/" but not "//"
path-absolute = "/" [ segment-nz *( "/" segment ) ]
segment-nz = 1*pchar
segment = *pchar
Then I wondered why before this change the request was processed by the
HTTP/1.1 backend server, had it been too lenient or was there a difference
in the protocol spec. The response is the latter. In RFC7230 #2.7, a
purposely different absolute-path is defined:
An "absolute-path" rule is defined for protocol elements that can
contain a non-empty path component. (This rule differs slightly from
the path-abempty rule of RFC 3986, which allows for an empty path to
be used in references, and path-absolute rule, which does not allow
paths that begin with "//".)
request-line = method SP request-target SP HTTP-version CRLF
request-target = origin-form
/ absolute-form
/ authority-form
/ asterisk-form
origin-form = absolute-path [ "?" query ]
absolute-path = 1*( "/" segment )
And this version is the one that was adopted by the HTTP core spec, but
the H2 spec keeps its difference with path-absolute that cannot start
with "//", even in the latest draft.
This use of "path-absolute" was introduced into the H2 spec between draft
04 and draft 05 when trying to precise the definition of :path. And I think
that by then the difference between HTTP/1 and RFC3986's interpretation of
path-absolute and absolute-path has simply been overlooked.
Given that in the report above the browsers happily sent the request using
the HTTP definition of absolute-path and not RFC3986's definition of
path-absolute (thus violating RFC7540), that sites *are* written to rely
on this, that this seems to be how other H2 implementations are currently
handling it, and that the new HTTP spec defines the format of a request-target
in origin form as an absolute-path as well, I think we should fix the latest
H2 draft to adopt the common definition of absolute-path (which explicitly
permits "//") and stop keeping a non-interoperable exception here.
Does anyone disagree ?
Thanks,
Willy
Received on Thursday, 19 August 2021 06:00:11 UTC