Re: Some clarifications

From: Jim Gettys (
Date: Tue, Dec 15 1998

Date: Tue, 15 Dec 1998 09:36:33 -0800
From: (Jim Gettys)
Message-Id: <>
To: Kacheong Poon <>
Cc: Henrik Frystyk Nielsen <>,
Subject: Re: Some clarifications

> From: Kacheong Poon <>
> Date: Mon, 14 Dec 1998 19:18:19 -0800 (PST)
> To: Henrik Frystyk Nielsen <>
> Cc:,
> Subject: Re: Some clarifications
> -----
> > Maybe your question is really whether we need to be able to prioritize
> > streams either from the client to the server or the other way. In fact, we
> > have been discussing this as there are several scenarios where this comes
> > up in typical Web applications. My feeling is that it should be done at a
> > higher layer and not try and deal with it in neither TCP nor WebMux.
> I misunderstood what you guys mentioned in the BOF...  I was wondering what
> Jim's callback channel was.  Since TCP stream is bidirectional, I thought
> Jim was asking for some channel which expedited data could be sent, outside
> the original TCP stream.  So I guess what he was asking for was a mux.

No, I wasn't asking for expedited data, though a mux layer can be used
that way by the multiplexing code having some notion of priority and
scheduling some high priority message before data on other mux sessions.

> > Note that there is no reason why bidirectionality can't be done in TCP
> > already and if peers can reuse state then this may be just as efficient as
> > what we are attempting to do in WebMux.
> I don't think it is very easy.  For one thing, unless you use T/TCP, you
> need to pay the initial 3-way handshake cost when opening multiple TCP
> connections to the same host.  Sharing window (congestion and send) info
> among multiple TCP connections need a lot of thoughts.  Note that a lot of
> existing implementations already have some form of state sharing.  RTT, RTT
> variance, cong window threshold, and MSS info are shared and are used to
> initialize TCP connections.  But this does not solve the problem you are
> talking about.
> BTW, a mux over TCP means that if a packet of one session is dropped, all
> packets from other sessions, depending on the window size, may have to be
> delayed until the dropped packet is recovered.  With multiple TCP streams,
> this is not a problem.  But to me, a mux over TCP is easier to understand
> and implement than sharing TCP states (correctly) and let TCP do the work
> (correctly again).

Yes, MUX is a form of fate sharing, so that other channels may be blocked
waiting for the recovery of the underlying TCP. 

On the other hand, by putting everything into a single (or few) TCP
connections, you are much more likely to get the congestion information
"right", and be running TCP in the steady-state, rather than searching
for the point at which congestion is encountered, causing the initial
packet drop.

The big problem I see is if a second packet gets dropped, and so you do
a long time out before trying to restart.  This "stalled connection" is
so painful that human behavior is to abort the connection and restart it
(the infamous "stop" button on a browser, followed by "reload" on the
browser).  Again, better to throw away and reestablish a single TCP connection
than N of them, the current situation.

It is the second packet drop with is a killer, not the first (due to fast 
restart).  If something could be done to avoid such a long timeout to 
recover after a second packet drop, this would be goodness.

Without real experience with deployed applications, it isn't clear what
the consequences of this fate sharing are in practice.

> > Yes, this is what we have in mind. When we implemented HTTP/1.1 with
> > persistent connections, pipelining, and output buffering, it would have
> > been extremely useful for us to have had access to segment sizes, timers,
> > etc. The best we can do for now is to guess these parameters.
> So I guess you guys want an API which gets info like netstat does.  BTW,
> the TCP_MAXSEG socket option gives you the segment size.  It is implemented
> in many TCP stacks.  Maybe people on this mailing can write a document which
> lists those info needed and propose to have them added to socket API.

Thanks for the pointer; there is other information we'd like as well.

Henrik and I hope to get something together for the end-to-end list by
early in January (time between now and the holidays is short).
				- Jim