Re: Comments on Section 6.1 (Persistent Connections) of HTTPbis Part 1, version 17 from Willy Tarreau on 2011-12-21 (ietf-http-wg@w3.org from October to December 2011)

From: Willy Tarreau <w@1wt.eu>
Date: Wed, 21 Dec 2011 08:04:32 +0100
To: Jonathan Billington <Jonathan.Billington@unisa.edu.au>
Cc: "ietf-http-wg@w3.org" <ietf-http-wg@w3.org>, Sonya Arnold <sonya.arnold@unisa.edu.au>
Message-ID: <20111221070432.GA7664@1wt.eu>
Hello Jonathan,

On Wed, Dec 21, 2011 at 11:50:24AM +1030, Jonathan Billington wrote:
> You wrote:
> >In fact it depends who closes first and how the close is performed. If the
> >server closes first, you have the usual 3-way handshake and there is no
> >delay imposed by the TIME_WAIT state because while the socket remains in
> >this state on the server, it can immediately be reopened when the client
> >provides a new SYN above the end of the last window.
> 
> This last sentence does not correspond to TCP in RFC793.
> 1. When closing you have two relatively independent two way handshakes (FIN then ACK) as described above, rather than a 3-way handshake.

There are multiple ways of closing a connection, and the most common in
HTTP when the server closes is a 3-way handshake :
   - server sends FIN
   - client acks and sends FIN
   - server acks client's FIN and puts itself in TIME_WAIT state.

> 2. On page 69 of RFC 793 it describes the conditions under which a segment is acceptable in the TIME-WAIT state. It is not acceptable if it is outside the window, which is the case if the sequence number of the new SYN is "above the end of the last window".

This rule applies to a segment which belongs to an active session. This
is not what I was talking about, I was talking about re-opening a session
which was in TIME_WAIT. Those are two completely different things.

(...)
> This is further backed up in the case of SYNs on the bottom of page 71. Thus on receipt of a SYN in this case, the server would return an ACK and remain in the TIME-WAIT state, until the 2MSL timer expires.

No, not at all. A server would return an ACK to a SYN in TIME_WAIT only if
the SYN was within the last window because it would look like a retransmit.
That's precisely why you need to have it above.

> There is also a clarification of this situation in RFC 1122 on page 88.
> 
>             When a connection is closed actively, it MUST linger in
>             TIME-WAIT state for a time 2xMSL (Maximum Segment Lifetime).
>             However, it MAY accept a new SYN from the remote TCP to
>             reopen the connection directly from TIME-WAIT state, if it:
> 
>             (1)  assigns its initial sequence number for the new
>                  connection to be larger than the largest sequence
>                  number it used on the previous connection incarnation,
>                  and

It's exactly this rule I'm talking about.

>             (2)  returns to TIME-WAIT state if the SYN turns out to be
>                  an old duplicate.

It's this one you're confusing it with.

> Note the mandatory requirement (MUST) to stay in the TIME-WAIT state for 2MSL.

Look at the "However" term above :-)

> The additional text allows (MAY) a SYN to be received (and not dropped) in TIME-WAIT, with presumably a transition to SYN_RCVD (and sending of the SYNACK) after the 2MSL timer expires (to obey the mandatory requirement). Thus TCP does not have to return to LISTEN, it can transition directly from TIME-WAIT to SYN_RCVD. Ignoring the 2MSL requirement is done at your peril, as this could lead to old duplicate segments being accepted instead of the new ones (not good for financial transactions or safety critical applications).

I can assure you that you will not find a single TCP stack which does not
apply the rule above, otherwise it would basically not be usable. Also,
when you read RFC793 you have to understand that it was written with
single connections in mind (hence the "return to LISTEN"). Implementations
do not "return to LISTEN", they have one socket always in LISTEN state
from which they instantiate as many sessions as they need upon each new
SYN. The rule above makes perfect sense if you're using TCP on top of a
serial line with a single connection at a time, but this is not what we're
doing here.

Please make a test if you don't believe me. Currently the web would not work
with the rule as you state it. 2 MSL is 240 seconds by default, you would
exhaust the 64k source ports at only 260 connections per second ! I'm commonly
playing with servers which I push up to 200000 connections per second, this is
1000 times above the limit which would be imposed by the rule if implementations
did not apply the "MAY" above.

> >If the client closes first using the 3-way handshake, then what you describe
> >happens and in practice the client is stuck. For this reason, when clients
> >absolutely needs to close, they usually close with an RST which saves them
> >from the TIME_WAIT delay and at the same time saves bandwidth since only one
> >packet is sent.
> 
> The use of RST to close is not mentioned in Part 1 as far as I am aware. On the contrary, one gets the impression in section 6.1.4 Practical Considerations, that the graceful close should be used, at least for timeouts:
> 
>    When a client or server wishes to time-out it SHOULD issue a graceful
>    close on the transport connection.

But if the client does this, it is the one which gets the TIME_WAIT state
and this time it is forced to respect the 2*MSL timer. Doing this on a
browser is not critical, but doing this on a proxy stops its operations
after 64k connections. Closing with an RST is not clean at all but sometimes
you have no other choice.

> This seems to relate to active ticket #176 where the use of graceful close (half-close) is also suggested, to avoid loss of responses.

No, this is different, the half-close to avoid loss of response is for the
server, because if the server closes too fast, the last pending data may not
linger long enough in the TCP stack for the client to retrieve them, which
can end up with the client getting a truncated response, especially if the
client uses pipelining and tries to send another request before it gets the
full response.

Regards,
Willy
Received on Wednesday, 21 December 2011 07:05:03 UTC