Re: Broken pipes & lost requests from Michel Philip on 2001-12-20 (www-lib@w3.org from October to December 2001)

From: Michel Philip <philipm@altern.org>
Date: Thu, 20 Dec 2001 09:34:49 +0100
To: www-lib@w3.org
Cc: Azzurra Pantella <azzurra.pantella@netikos.com>
Message-ID: <3C21A2A9.5CE313@altern.org>
Hi Azzurra.

Thanks for your answer.

>  > Then every time a request will be un-answered you miss to decremented
>  > the counter.
> 
>  Hi Michel,
>    I dare say this is not true because ususally there's a HTTimer in the
>  sockevent object that, in case of unanswered request and  after an
>  EventTimeout,  calls the HostEvent() (in HTHost.c) with type TIMEOUT.
>     This leads to a kill of the pipeline with subsequent call of the
>  HTNetDelete() function, scheduling of the     Terminate_handlers and at
> last a decrement of the HTNetCount.
>  This is how the libww grants that any submitted request will in someways be
>  "answered".

True. I thought it few after I send it: any way the request done,
the terminate_handler is normally called.

>  I insist on saying that under certain conditions (frequent broken pipes),
>  when a broken pipe return state is  caught by the HTWriter_write() (in
>  HTWriter.c), the above mechanism doesn't work.
>  In such case the pipeline remains "unanswered" ( I suspect even subsequent
>  mismatched answers).

I have an issue with the lib. Looks like if a net object stay in the
pipeline
and makes that no more requests can be successful to the targeted host.
For me it happens rather with https requests.
It does not makes (but crash crashes are program dependent are I've some
little patches)

I had called HTNet_setRawBytesCount(YES) and a read_handler alert
callback.
Before I patch HTMime.c I had crashes in HTTCP.c
So I don't means there is no bugs in the lib but I'm very carefull about
where they are. 

>  > Register a user timer for each request you start. (b.e. 30 sec.)
>  > Remove the timer when you get the answer.
>  > NetKill the request if the timer trigger.
> 
>  We had been considering a similar solution but then what happens if an
>  answer (i.e. a late http responseto our request) arrives after the timout?
>  Which Net object will it be associated to?
>  As far as I understand, the first answer in the incoming buffer (from Net)
>  is bound with the oldest object in pipeline and if we get more answers 
>  than  those expected then there won't be a correct request-response match.
>  Don't you think so?

I've worried about this. I've thought that killing the pipe will close
the channel.

In the timeout_handler I do:

  HTNet *net = HTRequest_net(req);
  HTRequest_setContext (req, NULL);  
  HTNet_killPipe(net);

To close the channel was not ideal for my need.
Now I'm not sure if the channel is closed when the is pending requests.
This is very hard to check with my program for I don't master when 
the requests start.
Concretly it does not seem to make problem with http.
I will reconsider that for https.

>  > > [...]
>  > > This is the bug-fix I'm proposing (HTTP.c line 1236):
>  > >
>  > > /* Now check the status code */
>  > > if (status == HT_WOULD_BLOCK)
>  > >     return HT_OK;
>  > > else if ( status == HT_PAUSE | | status == HT_LOADED) {
>  > >     type = HTEvent_READ;
>  > > } else if ( status == HT_ERROR) {
>  > >     http->state = HTTP_KILL_PIPE;
>  > > } else if ( status == HT_CLOSED )
>  > >     http->state = RECOVER_PIPE;
>  > >
>  > > instead of:
>  > >
>  > > /* Now check the status code */
>  > > if (status == HT_WOULD_BLOCK)
>  > >     return HT_OK;
>  > > else if ( status == HT_PAUSE | | status == HT_LOADED) {
>  > >     type = HTEvent_READ;
>  > > } else if ( status == HT_ERROR)
>  > >     http->state = HTTP_RECOVER_PIPE;

>  > >  And now 2 questions to the libww community:
>  > > 1) Is what I have noticed really a bug or is there any reason not to
>  > > recover after a sigpipe in the write branch ?
>  >
>  > No bug. (Or at least not here ;-)
>  > For me there is obvious reason not to recover when the stream have been
>  > closed:
>  > there is nothing to recover.
>  > Recovering is for pipeling and pipe is closed.

Killing the pipe still looks to me the good choice when (status ==
HT_ERROR).
If I have problem with that I will rather wonder about why status have
this value
and try to change things before so another value will be returned.

>  > If you start multiple request toward the same HTTP 1.1 host they will
>  > pipeline:
>  >
>  > out> GET req_1
>  > out> GET req_2
>  > out> ...
>  >
>  > in<  page_1
>  > in<  page_2
>  > in<  ...
>  >

I not sure about that.
This was my initial vision but one time stepping thing I've wondered 
taht it could be:
 
open
out> GET req_1
in<  page_1 chunk_1
in<  page_1 chunk_2
in<  ...
out> GET req_2
in<  page_2 chunk_1
in<  page_2 chunk_2
in<  ...
close

>  In fact we start Multiple requests to the same HTTP1.1 host but, as I tried
>  to explain before, having the connection closed by the server side seems
>  sometimes to cause a certain number of request not to have their
>  terminate_handlers activated. You are probably right, it can't be called a
>  bug, but still there might be something undesired.
>  Let me explain why  we felt quite confident in trying to force the recovery
>  after a NETWRITE returning  with (socerrno == EPIPE).
>  In 1999 Olga Antropova suggested to treat a broken pipe in the write branch
>  just like in the read case
>  (cfr mailing list Mon. Aug 23 1999) and the proposed patch was accepted and
>  appears in the  last release.

Good to check the history. 
Interresting. 1999 means not so much tested.

Now I must indicate the I use the lib.
The program is test on platforms NT4sp6, W2K and Solaris7/8.
I compile with WWW_WIN_ASYNC undefined.

Checking the history, I've decide to change the #define condition
of the "2000/08/02 Jens Meggers (jens@meggers.com)" patch in lib5.3.2.
cause it seem it have been confusion between _WINSOCKAPI_ and
WWW_WIN_ASYNC
Checking the Jens own special version of HTHost 
(allowing use of different sockets to the same host)
show that he have cleaned that also but it's not in the release not in
the CVS.

This is a matter only on Win32 platform.
Which platform are you running on?

>  But, whereas the HT_CLOSED return code from a HTHost_read() ( meaning
> broken
>  pipe) is checked in HTTPEvent() (HTTP.c) and causes a recovery,  when a
>  write issued in HTTPEvent returns HT_CLOSED
>  nothing is done to recover. We imagined that she meant to recover in both
>  cases.
> 
>  > > 2) The buffer toward the network is flushed also when an HTTimer bound
>  > > to an HTNet object is dispatched. In that case
>  > >     within the FlushEvent() in HTBufWrt.c  the return  value from the
>  > > HTBufferWriterFlush() is NOT CHECKED AT ALL !
>  >
>  > Problably checked another way.
> 
>   Don't think so. Look at the comment in FlushEvent() (HTBufWrite.c) just
>  before the HTWriter_write() call !


Other people before me had patched the lib.

There was a patch in HTWriter_write.

    while (wrtp < limit) {
	if ((b_write = NETWRITE(soc, wrtp, len)) < 0) {
	    if (socerrno == EWOULDBLOCK)
	    {
		return HT_WOULD_BLOCK;
	    } else if (socerrno == EINTR) {                
		continue;
	    } else {
		host->broken_pipe = YES;
	        if (socerrno == EPIPE) {
		    return HT_CLOSED;		    
		}
		/* all errors that aren't EPIPE */
		/* PATCH */
		host->broken_pipe = YES;
		/* PATCH */
                ...
		return HT_ERROR;
	    }
	}

I've let it. It is too far subtil for me!

As for me I've patched HTBufWrite.c for it wents in infinite loop 
while recovering a closed channel.

End of HTBufferWriter_write is now:

	    if (status == HT_OK) {
		me->read = me->data;
	    } else if (status == HT_WOULD_BLOCK) {
		HTBufferWriter_addBuffer(me, len);
		memcpy(me->read, buf, len);
		me->read += len;
		return HT_OK;
	    }	    
            /* PATCH philipm@altern.org
	     * Avoid an infinite loop.
	     */	    
            else {
		return HT_ERROR;
	    }

I believe I already wrote this in the lib 
but nobody told me if it is good or not.
Received on Thursday, 20 December 2001 03:33:12 UTC