- From: Willy Tarreau <w@1wt.eu>
- Date: Thu, 30 Jun 2022 09:01:23 +0200
- To: "Roy T. Fielding" <fielding@gbiv.com>
- Cc: Tatsuhiro Tsujikawa <tatsuhiro.t@gmail.com>, HTTP <ietf-http-wg@w3.org>
Hi Roy, On Wed, Jun 29, 2022 at 10:49:58AM -0700, Roy T. Fielding wrote: > > With HTTP/1.1 there are less ambiguities since Host is mandatory, but > > the distinction between "proxy requests" and origin requests is still > > relevant, especially when you don't know whether or not the origin > > server supports HTTP/1.1 or only 1.0 (and may be confused by the > > presence of an authority in the request line). For example, if a > > client sends: > > > > GET / HTTP/1.1 > > Host: example.com > > > > to an HTTP/1.0 server that parses Host, it will work. If it sends > > > > GET http://example.com/ HTTP/1.1 > > Host: example.com > > > > To an HTTP/1.1 server, it will work as well, but it may fail to an HTTP/1.0 > > server (or worse, loop over itself if it supports proxing requests and > > resolves itself as example.com). > > Well, this ship has sailed, but I must have missed that original discussion. > > The premise is incorrect in all respects, since all of those HTTP/1.1 > requests are also valid HTTP/1.0 requests (even with an absolute URI) > and so is the presence of Host in those requests. That's what I mentioned as well (sorry if I was not clear), it's just that there are not the same expectations in that HTTP/1.0 is more lenient. > Host is an HTTP/1.x field that was used in HTTP/1.0 requests (in 1995) > as soon as we reached consensus on the field name. That was long before > 1.1 was finished and 1.0 obsoleted. Oh I'm well aware of this as well, and indeed, 1.1 was mostly an update to write down what was being practised in field. > Host is a required part of HTTP/1.0 now just by virtue of the Internet as > deployed, regardless of the informational RFC. > > [The idea was originally proposed in 1994 by John Franks > > https://lists.w3.org/Archives/Public/ietf-http-wg-old/1994SepDec/0019.html > > but it took a long time to converge on a single syntax > > https://lists.w3.org/Archives/Public/ietf-http-wg-old/1995JanApr/0067.html > https://lists.w3.org/Archives/Public/ietf-http-wg-old/1995JanApr/0084.html > https://lists.w3.org/Archives/Public/ietf-http-wg-old/1995JanApr/0130.html > https://lists.w3.org/Archives/Public/ietf-http-wg-old/1995SepDec/0291.html Indeed. > and while we still talk about it as an important addition of HTTP/1.1 (because > that's where we chose to document it), the feature is required for 1.0 to > work with deployed servers.] That's one point I disagree with. Actually, the *vast* majority of servers I'm seeing do not require a Host on HTTP/1.0 requests. And I'm pretty sure that it doesn't change much over time, because most of our users continue to use HTTP/1.0 to send health checks to servers, precisely because it doesn't require to configure a host. Thus you just send "HEAD / HTTP/1.0" and nothing more and if you get a response it indicates the server is not dead. E.g: $ telnet www 80 Trying 10.x.x.x... Connected to www. Escape character is '^]'. HEAD / HTTP/1.0 HTTP/1.1 200 OK Date: Thu, 30 Jun 2022 06:15:39 GMT Server: Apache Last-Modified: Wed, 18 Nov 2015 19:41:20 GMT Accept-Ranges: bytes Content-Length: 15019 Cache-Control: max-age=28800 Expires: Thu, 30 Jun 2022 14:15:39 GMT Vary: Accept-Encoding Connection: close Content-Type: text/html I'm quite often using this to find the site name from an IP address that doesn't resolve, based on a redirect present in the response or the domain of a set-cookie field for example. And one could think that it's only for internal hosts but you can still find plenty of them on the net, probably in order to satisfy scripts or low quality tools: $ telnet google.com 80 Trying 216.xx.xxx.xxx... Connected to google.com. Escape character is '^]'. HEAD / HTTP/1.0 HTTP/1.0 200 OK Content-Type: text/html; charset=ISO-8859-1 Date: Thu, 30 Jun 2022 06:18:14 GMT Server: gws X-XSS-Protection: 0 X-Frame-Options: SAMEORIGIN Expires: Thu, 30 Jun 2022 06:18:14 GMT Cache-Control: private Set-Cookie: AEC=...; expires=Tue, 27-Dec-2022 06:18:14 GMT; path=/; domain=.google.com; Secure; HttpOnly; SameSite=lax And there a wide variety of small and convenient servers such as thttpd which remain used by admins to distribute packages or deliver status pages from their monitoring solution, as well as many trivial servers coming from generic language frameworks that continue to only support 1.0 and sometimes do not even care for Host (and in any case do not require it). E.g: $ python -m SimpleHTTPServer 8080 & Serving HTTP on 0.0.0.0 port 8080 ... $ telnet 0 8080 Connected to 0. Escape character is '^]'. HEAD / HTTP/1.0 127.0.0.1 - - [30/Jun/2022 08:24:53] "HEAD / HTTP/1.0" 200 - HTTP/1.0 200 OK Server: SimpleHTTP/0.6 Python/2.7.17 Date: Thu, 30 Jun 2022 06:24:53 GMT Content-type: text/html; charset=ISO-8859-1 Content-Length: 4882 Connection closed by foreign host. $ telnet 0 8080 Trying 0.0.0.0... Connected to 0. Escape character is '^]'. HEAD / HTTP/1.0 Host: _this isn't even a valid host field_ 127.0.0.1 - - [30/Jun/2022 08:25:48] "HEAD / HTTP/1.0" 200 - HTTP/1.0 200 OK Server: SimpleHTTP/0.6 Python/2.7.17 Date: Thu, 30 Jun 2022 06:25:48 GMT Content-type: text/html; charset=ISO-8859-1 Content-Length: 4882 Connection closed by foreign host. > So, an HTTP proxy recipient that receives any form of authority/host > information must forward that information in either Host or :authority, > no matter what version it is using. That's not what I've been used to seeing from early HTTP proxies, and I can even still verify it right here on an old squid that I don't use anymore but that's still available here: $ telnet proxy 3128 Trying 10.x.x.x... Connected to proxy. Escape character is '^]'. HEAD / HTTP/1.0 Host: google.com HTTP/1.0 400 Bad Request Server: squid/2.6.STABLE13 Date: Thu, 30 Jun 2022 06:29:59 GMT Content-Type: text/html Content-Length: 1204 Expires: Thu, 30 Jun 2022 06:29:59 GMT X-Squid-Error: ERR_INVALID_REQ 0 X-Cache: MISS from px.home.local Via: 1.0 px.home.local:3128 (squid/2.6.STABLE13) Proxy-Connection: close On the opposite with a full request and no host: $ telnet proxy 3128 Trying 10.x.x.x... Connected to proxy. Escape character is '^]'. HEAD http://google.com/ HTTP/1.0 HTTP/1.0 301 Moved Permanently Location: http://www.google.com/ Content-Type: text/html; charset=UTF-8 Date: Thu, 30 Jun 2022 06:31:29 GMT Expires: Sat, 30 Jul 2022 06:31:29 GMT Cache-Control: public, max-age=2592000 Server: gws Content-Length: 219 X-XSS-Protection: 0 X-Frame-Options: SAMEORIGIN X-Cache: MISS from px.home.local Via: 1.0 px.home.local:3128 (squid/2.6.STABLE13) Proxy-Connection: close Of course, modern proxies get this right. But based on what you see above, it's extremely important to preserve the distinction between these respective fields (i.e. :authority goes to :authority and host goes to host), and here I agree that it's irrelevant to the HTTP version. > Failure to do so introduces a > security bypass because L7 routers act on that information whether > or not the client/server pair is aware of their presence. Normally the L7 routers will decide what to do when that info is absent, or pick it from the authority field, and they must reject requests which have both and mismatch. But I'm extremely careful not to pick one field and move it to the other one and conversely. > Hence, an HTTP/1.0 proxy that receives your first example should forward > that as > > GET / HTTP/1.0 > Host: example.com > Proxy-connection: keep-alive > > because the routing doesn't work otherwise due to name-based hosts > being deployed before HTTP/1.1. A proxy aware of HTTP/1.1 will likely do that because it knows about such rules, but an older proxy will not necessarily (as seen above). If you remember, these were among the issues we've all been facing in the late 90's when chaining proxies or starting to mix proxies and servers. And when you're developing a gateway that can be placed before any type of agent you have to be extremely careful about not denaturating the messages that pass through. > And, no, there is absolutely no reason to concern ourselves with proxies > that loop over their own hostnames, since that is a self-correcting error > whenever a full URI is received as the request target. In fact you're right here, I remember the exact case where I was facing this recurrent problem, it was when configuring a component to act both as a forward and reverse proxy, precisely because due to the reverse proxy case it was allowed to forward the request it passed onto itself. I still remember blocking requests having a Via from itself to break such loops, and insisting not to deploy forward and reverse together... painful times if you ask me. With 1.1 and the requirement that origin servers accepted absolute requests and made host mandaory, and that proxies would route on regular requests, that solved everything but it took quite some time to spread fully. > > What we're > > doing in haproxy is that both Host and :authority are used interchangeably > > after having been checked for proper matching, and are modified at the > > same time if needed, and we have a flag indicating if an authority was > > present in the incoming request to know if we have to produce one on > > output or not. That's in the end what seems to preserve the most accurate > > representation along a chain of multiple versions. This allows us to emit > > a Host field only if one was present, and an authority only if one was > > present, regardless of the HTTP version. I don't think that RFC9113 brings > > any changes regarding this, it might only be a matter of what constitutes > > "control data". > > Sorry, that is a broken implementation. You need to send Host regardless > of the original request version. I can guarantee you that each time we accidently failed to do this because of a tiny change or some strengthening of the checks of host vs authority, we got instant reports of various 1.0 applications getting broken. And actually I did verify carefully that the updated set of RFCs continued to cover that compatibility requirement with these old components, i.e. Host remains Host and :authority remains :authority along all the chain, and only when both are set, they must match and we can simplify (e.g. drop authority when passing to an HTTP/1.x server). And there's a reason why HTTP/1.0 remains quite popular for internal tools, it has the benefit of requiring zero processing after the end of headers. This is extremely convenient for scripts, you read till the empty line and stream the rest till the closure into the file (or to a pipe or whatever; you just need "sed '1,/^$/d'" to strip headers). You can also find plenty of simple update scripts that download and install a package based just on netcat or even just /dev/tcp in bash or zsh. As soon as you start to speak HTTP/1.1 there you have the risk that the server responds with chunked encoding then you need curl or wget. Thus as much as I would like it to disappear, I regularly discover new implementations of it :-/ Regards, Willy
Received on Thursday, 30 June 2022 07:01:46 UTC