Re: Fwd: Re: [tcpm] FW: Call for Adoption: TCP Tuning for HTTP from Patrick McManus on 2016-03-03 (ietf-http-wg@w3.org from January to March 2016)

From: Patrick McManus <mcmanus@ducksong.com>
Date: Thu, 3 Mar 2016 15:43:38 -0500
To: Willy Tarreau <w@1wt.eu>
Cc: Joe Touch <touch@isi.edu>, HTTP Working Group <ietf-http-wg@w3.org>
Message-ID: <CAOdDvNokUDxmfy87VrQNLoQvQknP6L3h6fLbuFeVpOiDN4szAQ@mail.gmail.com>

Hi Wily, Joe,

This message is a bit of a diversion from the discussion so far. sorry bout
that.

On Thu, Mar 3, 2016 at 1:44 PM, Willy Tarreau <w@1wt.eu> wrote:

> I've seen people
> patch their kernels to lower the TIME_WAIT down to 2 seconds to address
> such shortcomings! Quite frankly, this workaround *is* causing trouble!
>

really? That's fascinating to me. Can you provide background or citations
on what kind of trouble has been attributed to this and the scenario where
it was done?

You don't need to go through the theoretical - I know what TW could
conceptually catch - but the assertion about the shorter timeout causing
field problems is something I'd love to understand better.

For TW to be useful protection it also has to be paired with
re-transmission and some application states that will be impacted by the
screwup which reduces its utility, particularly for HTTP. I read the above
statement as saying that TW is indeed useful in the field at the
application layer - but maybe it is referencing some side effect I'm not
thinking of rather than the vulnerability of not using it.

Those kinds of post mortem war stories where it is seen in the field are
pretty interesting and help inform the discussion about whether the cure is
worse than the disease. My inclination has generally been that TW doesn't
help a lot in practice and has some limitations and causes pain (as well
documented in this thread.). So it would be interesting to look at the
fallout of a situation it could have helped with.

This feels a bit like the musing over the subpar utility of the tcp
checksum on high bandwidth networks. For that one, the answer at least in
the http space is 'use https for integrity and sort out the rare error at
the application level'. I'm wondering if that's the right advice in the
time_wait space for http as well.. we're really still talking about
integrity. Go ahead and turn it off - just make sure you're running a
higher level protocol that won't confuse old data with new data.

(Fun paper: http://conferences2.sigcomm.org/imc/2015/papers/p303.pdf showed
that at the tail of a ping survey, 1 % of replies from 1% of addresses
needed >= 145 seconds to arrive. And that's just delay - not
retransmission. A truly protective TW is a very large number.)

Received on Thursday, 3 March 2016 20:44:03 UTC