- From: Willy Tarreau <w@1wt.eu>
- Date: Thu, 3 Mar 2016 19:44:18 +0100
- To: Joe Touch <touch@isi.edu>
- Cc: ietf-http-wg@w3.org
On Thu, Mar 03, 2016 at 10:00:12AM -0800, Joe Touch wrote: > > This point is important because it means some proxies often should > > better wait for a passive close from a server than deciding to > > close themselves. > > Transparent proxies don't have that choice - they're governed by the > semantics of the connection (whether EOF == close or not). > > Non-transparent proxies shouldn't be opening one connection per > transaction anyway; they ought to use one or more persistent connections > and leave them open while they are interacting with the proxy. If they > do this, there won't be an issue with who closes the connection because > the close frequency should be very low. It not that black and white unfortunately, and in practice it's very common to see proxies fail in field above 500 connections per second because their TCP stack was not appropriately tuned, and with the default 60s TIME_WAIT timeout of their OS, they exhaust the default 28k source ports. The first things admins do in this case is to enable tcp_tw_recycle (which basically causes timewaits to be killed when needed), and this appears to solve the situation while it makes it even worse. Among the solutions, we can count on : - putting back idle connections into pools hoping that they will be reusable. Connection reuse rate still remains low on average. - keeping a high enough keep-alive idle timeout on the proxy and a smaller one on the server (when the proxy is a gateway installed on the server side) hoping for the server to close first - appropriately add "connection: close" into outgoing requests to ask the server to close after the response. - disabling lingering before closing when the HTTP state indicates the proxy has received all data - doing whatever is imaginable to avoid closing first These are just general principles and many derivatives may exist in various contexts, but these ones are definitely important points that HTTP implementors have to be aware of before falling into the same traps as the ones having done so previously. > >> In the bulk of HTTP connections, the server closes the connection, > >> either to drop a persistent connection or to indicate "EOF" for a transfer. > > > > Yes. > > > >> Clients generally don't enter TIME-WAIT, so reducing the time they spend > >> in a state they don't enter has no effect. > > > > They can if they close first and that's exactly the problem we absolutely > > want to avoid. > > TW buildup has two effects: > > 1) limits the number connection rate to a given IP address Exactly. > 2) consumes memory space (and potentially CPU resources) This one is vey cheap. A typical TW connection is just a few tens of bytes. > Neither is typically an issue for HCI-based clients. I don't know what you call HCI here, I'm sorry. > Servers have much > higher rate requirements for a given address when they act as a proxy > and consume more memory overall because they interact with a much larger > set of addresses. Servers are not penalized at all with the connection rate since it only limits the *outgoing* connection rate and not the incoming one. There's never any ambiguity when a SYN is received regarding the possibility that the connection still exists on the other side, which is why TW connections are recycled when receiving a new SYN. Regarding the memory usage, it remains very low compared to the memory used by the application itself. My personal record was at 5.5 million timewaits on a server at 90000 connections per second. It was only 300 MB of RAM on a server having something like 64 GB. And not everyone needs 90k conns/s but everyone needs more than 500/s nowadays in any infrastructure. > > There are certain cases where we had to put warnings in > > rfc7230/7540, especially in relation with proxies. The typical case is > > when a client closes a connection to a proxy (eg: a CONNECT tunnel) and > > the proxy is supposed to in turn close the connection to the server. In > > this case the proxy is the connection initiator, and it can very quickly > > condemn all of its source ports by accumulating TIME_WAITs there. > > That speaks to a mismanagement of port resources. If they are allocated > on a per-IP basis, they won't run out. Yes they do, that's the problem everyone running a load balancer faces! The highest connection rate you can reach per server is around 1000 with 64k ports! That started not being enough 15 years ago! > The error is in treating the pool > of source ports as global across all IP addresses, which TW does not > require. No, the problem is to keep a TW which blocks a precious resource which is the source port that is only addressed on 16 bits! > > I'm saying that by all means the > > server must close first to keep the TIME_WAIT on its side and never > > on the client side. A TIME_WAIT on a server is very cheap (a few tens > > of bytes of memory at worst) > > It costs exactly the same on the client and the server when implemented > correctly. It costs the same except that in one case it prevents a connection from being established while in the othe case it does not. I've seen people patch their kernels to lower the TIME_WAIT down to 2 seconds to address such shortcomings! Quite frankly, this workaround *is* causing trouble! > > and can be recycled when a new valid SYN > > arrives. > > The purpose of TW is to inhibit new SYNs involving the same port. When a > new SYN arrives on another port, that has no impact on existing TWs. I'm always talking about the same port. On todays hardware and real world workloads, source ports can be reused every second (60k conns/s). Only the server with TIME_WAIT can tell whether or not an incoming SYN is a retransmit or a new one. The client knows it's a new one but doesn't know if the server is still in LAST_ACK or has really closed, and due to this uncertainty it refrains from connecting. > > A TIME_WAIT on the client is not recyclable. That's why > > TIME_WAIT is a problem for the client and not for the server. > > See above; TW is *never* recyclable. Yes it definitely is on the server side, which is the point. When you receive a SYN whose ISN is higher than the end of the current window, it's a new one by definition (as indicated in RFC1122). > > The problem is that in some cases it's suggested that the client > > closes first and this causes such problems. > > That actually helps the server (see our 99 Infocom paper). Sure since the server doesn't receive any more traffic from this client, that definitely helps, but the point is to ensure traffic flows between the two hosts, not that one of them refrains from connecting. > > The only workaround for > > the client is to close with an RST by disabling lingering, > > That's not what SO_LINGER does. See: > http://man7.org/linux/man-pages/man7/socket.7.html But in practice it's used for this. When you disable lingering before closing, you purge any pending data which has the benefit that the data you just received from the server that carried an ACK for data you don't have anymore triggers a reset. Yes it's absolutely ugly but you have no other option when you are a client and are forced to close first due to the protocol. Don't forget that we're discussing a document whose outcome should be that protocols are designed in the future to avoid such horrible workarounds. > > but that's > > really ugly and unreliable : if the RST is lost while the server is > > in LAST_ACK (and chances are that it will happen if the ACK was lost > > already), the new connection will not open until this connection > > expires. > > TCP has a significant error regarding RSTs; the side that throws a RST > on an existing connection should really go into TW - for all the same > reasons that TW exists in the first place, to protect new connections > from old data still in the network. There are many other issues regarding RST. When you send an RST through a firewall, you'd better cross fingers for it not to be lost between the firewall and the destination, otherwise chances are that you won't get a second chance. That's one of the reasons why I'd love to live in a world where a client never has to close first. > > Also, there are people who face this issue and work around them using > > some OS-specific tunables which allow to blindly recycle some of these > > connections and these people don't understand the impacts of doing so. > > They really ought to read the literature. It's been out there so long it > can probably apply for a driver's license by now. When people see their production servers stall at 5% CPU because their LBs or proxies can't open new connections while full of TIME_WAIT, what they do is ask their preferred search engine which simply proposes them such advices : - https://ihazem.wordpress.com/2012/02/07/reducing-time_wait-socket-connections-recyclereuse/ - http://serverfault.com/questions/212093/how-to-reduce-number-of-sockets-in-time-wait - http://kaivanov.blogspot.fr/2010/09/linux-tcp-tuning.html - http://www.linuxbrigade.com/reduce-time_wait-socket-connections/ - http://www.stolk.org/debian/timewait.html Yes they all involve the wrong and nasty workarounds consisting in allowing to recycle outgoing TIME_WAIT connections, which is the worst ever thing to do (except the last one which explains how to modify the TW timeout in the kernel). This is a *real* problem in field, it has been for a while because some protocols have been designed for lower loads without imagining that one day source ports would be reused that often. While we have to deal with this the best we can, it's important to ensure the same mistake is not done again in the future. Regards, Willy
Received on Thursday, 3 March 2016 18:44:55 UTC