Subtle incompatibility between H2 and H1's :path

Hello,

after tightening up the :path parser in haproxy to strictly comply with
both RFC7540 and the latest draft, one user of a large hosting platform
reported breakage of at least one hosted site which contains a few HTML
links with the path beginning with two slashes, resulting from the
concatenation of a base URL ending with a slash and a prefix. E.g:

    <img src="https://site.example.org//static/image.jpg">

At first I responded "that's expected as it is explicitly forbidden by
the H2 spec (RFC7540), which says":

     "The ":path" pseudo-header field includes the path and query parts
      of the target URI (the "path-absolute" production and optionally a
      '?' character followed by the "query" production (see Sections 3.3
      and 3.4 of [RFC3986])."

   And RFC3986#3.3:

      path-absolute   ; begins with "/" but not "//"
      path-absolute = "/" [ segment-nz *( "/" segment ) ]
      segment-nz    = 1*pchar
      segment       = *pchar

Then I wondered why before this change the request was processed by the
HTTP/1.1 backend server, had it been too lenient or was there a difference
in the protocol spec. The response is the latter. In RFC7230 #2.7, a
purposely different absolute-path is defined:

  An "absolute-path" rule is defined for protocol elements that can
  contain a non-empty path component.  (This rule differs slightly from
  the path-abempty rule of RFC 3986, which allows for an empty path to
  be used in references, and path-absolute rule, which does not allow
  paths that begin with "//".)

     request-line   = method SP request-target SP HTTP-version CRLF
     request-target = origin-form
                    / absolute-form
                    / authority-form
                    / asterisk-form

     origin-form    = absolute-path [ "?" query ]
     absolute-path = 1*( "/" segment )

And this version is the one that was adopted by the HTTP core spec, but
the H2 spec keeps its difference with path-absolute that cannot start
with "//", even in the latest draft.

This use of "path-absolute" was introduced into the H2 spec between draft
04 and draft 05 when trying to precise the definition of :path. And I think
that by then the difference between HTTP/1 and RFC3986's interpretation of
path-absolute and absolute-path has simply been overlooked.

Given that in the report above the browsers happily sent the request using
the HTTP definition of absolute-path and not RFC3986's definition of
path-absolute (thus violating RFC7540), that sites *are* written to rely
on this, that this seems to be how other H2 implementations are currently
handling it, and that the new HTTP spec defines the format of a request-target
in origin form as an absolute-path as well, I think we should fix the latest
H2 draft to adopt the common definition of absolute-path (which explicitly
permits "//") and stop keeping a non-interoperable exception here.

Does anyone disagree ?

Thanks,
Willy

Received on Thursday, 19 August 2021 06:00:11 UTC