Re: Broken pipes & lost requests from Azzurra Pantella on 2001-12-18 (www-lib@w3.org from October to December 2001)

From: Azzurra Pantella <azzurra.pantella@netikos.com>
Date: Tue, 18 Dec 2001 15:01:12 +0100
To: "Michel Philip" <philipm@altern.org>
Cc: <www-lib@w3.org>
Message-ID: <01b701c187cc$73ae3780$b00516ac@netikos.com>
 ----- Original Message -----
 From: "Michel Philip" <philipm@altern.org>
 To: <www-lib@w3.org>
 Sent: Tuesday, December 18, 2001 6:27 AM
 Subject: Re: Broken pipes & lost requests


 >
 > > Azzurra Pantella wrote:
 > >
 > > Hi all,
 > > I am working on a robot-like client application using libwww to submit
 > > big amounts of GET requests to the server side.
 > > Time ago we came across a memory growth problem due to the HTAnchor
 > > structures lifetime (once allocated they are NEVER freed till the end
 > > of the program). Trying to solve this problem with a periodic "garbage
 > > collection" of old
 > > HtAnchor objects every N submitted requests, we noticed that the
 > > library counter of active HTNet objects "HTNetCount" could
 > > remain indefinitely postive as if some requests had got lost.
 > > Consider that the HTNetCount is incremented every time an HTNet object
 > > is added to the NetTable (Hash table containing all the HTNet) and
 > > decremented when, after the reception of the response to the submitted
 > > request, the matching HTNet object is deleted and removed from the
 > > NetTable.
 >

 > Then every time a request will be un-answered you miss to decremented
 > the counter.

 Hi Michel,
   I dare say this is not true because ususally there's a HTTimer in the
 sockevent object that, in case of unanswered request and  after an
 EventTimeout,  calls the HostEvent() (in HTHost.c) with type TIMEOUT.
    This leads to a kill of the pipeline with subsequent call of the
 HTNetDelete() function, scheduling of the     Terminate_handlers and at
last
 a decrement of the HTNetCount.
 This is how the libww grants that any submitted request will in someways be
 "answered".
 I insist on saying that under certain conditions (frequent broken pipes),
 when a broken pipe return state is  caught by the HTWriter_write() (in
 HTWriter.c), the above mechanism doesn't work.
 In such case the pipeline remains "unanswered" ( I suspect even subsequent
 mismatched answers).



 > Register a user timer for each request you start. (b.e. 30 sec.)
 > Remove the timer when you get the answer.
 > NetKill the request if the timer trigger.

 We had been considering a similar solution but then what happens if an
 answer (i.e. a late http responseto our request) arrives after the timout?
 Which Net
 object will it be associated to?
 As far as I understand, the first answer in the incoming buffer (from Net)
is
 bound with the oldest object in pipeline and if we get more answers than
 those expected then there won't be a correct  request-response match.
 Don't you think so?

 > > We noticed that this requests loss took place only if there had
 > > previously been some SIGPIPE signal reception while
 > > writing to  the network (HTWriter_write() in HTWriter.c).
 > > That led us to suspect the presence of a BUG as in the HTTP state
 > > machine realized in the HTTPEvent() function ( HTTP.c module) the
 > > reception of a SIGPIPE after a write does NOT cause a recovery. In
 > > fact in case of broken pipe the returned value HT_CLOSED is never
 > > checked. I suggest a behaviour similar to that after a HTHost_read
 > > (l.1249 in HTTP.c, HTTPEvent() function): in case of HT_ERROR kill the
 > > pipeline, in case of broken pipe (HT_CLOSED return value) try to
 > > recover the pipeline.
 > >
 > > This is the bug-fix I'm proposing (HTTP.c line 1236):
 > >
 > > /* Now check the status code */
 > > if (status == HT_WOULD_BLOCK)
 > >     return HT_OK;
 > > else if ( status == HT_PAUSE | | status == HT_LOADED) {
 > >     type = HTEvent_READ;
 > > } else if ( status == HT_ERROR) {
 > >     http->state = HTTP_KILL_PIPE;
 > > } else if ( status == HT_CLOSED )
 > >     http->state = RECOVER_PIPE;
 > >
 > > instead of:
 > >
 > > /* Now check the status code */
 > > if (status == HT_WOULD_BLOCK)
 > >     return HT_OK;
 > > else if ( status == HT_PAUSE | | status == HT_LOADED) {
 > >     type = HTEvent_READ;
 > > } else if ( status == HT_ERROR)
 > >     http->state = HTTP_RECOVER_PIPE;
 > >
 > >
 > >  And now 2 questions to the libww community:
 > > 1) Is what I have noticed really a bug or is there any reason not to
 > > recover after a sigpipe in the write branch ?
 >
 > No bug. (Or at least not here ;-)
 > For me there is obvious reason not to recover when the stream have been
 > closed:
 > there is nothing to recover.
 > Recovering is for pipeling and pipe is closed.
 >
 > If you start multiple request toward the same HTTP 1.1 host they will
 > pipeline:
 >
 > out> GET req_1
 > out> GET req_2
 > out> ...
 >
 > in<  page_1
 > in<  page_2
 > in<  ...
 >
 > If you decide page_1 is too long to load you could decide to stop it.
 > (register your own socket in the select to indicate this
 >  because the lib is not multithread safe. or register a timer)
 >
 > When you kill the request (in fact the loading request of the Net
> structure)
 > recovering will stop loading page_1 but will not close the pipe and
 > loading
 > of page_2 will start whitout the need to repeat the resting requests
 > 'cause
 > the server already receive them
 >
 > out> GET req_2
 > out> ...
 >
 In fact we start Multiple requests to the same HTTP1.1 host but, as I tried
 to explain before, having the connection closed by the server side seems
 sometimes to cause a certain number of request not to have their
 terminate_handlers activated. You are probably right, it can't be called a
 bug, but still there might be something undesired.
 Let me explain why  we felt quite confident in trying to force the recovery
 after a NETWRITE returning  with (socerrno == EPIPE).
 In 1999 Olga Antropova suggested to treat a broken pipe in the write branch
 just like in the read case
 (cfr mailing list Mon. Aug 23 1999) and the proposed patch was accepted and
 appears in the  last release.
 But, whereas the HT_CLOSED return code from a HTHost_read() ( meaning
broken
 pipe) is checked in HTTPEvent() (HTTP.c) and causes a recovery,  when a
 write issued in HTTPEvent returns HT_CLOSED
 nothing is done to recover. We imagined that she meant to recover in both
 cases.


 > > 2) The buffer toward the network is flushed also when an HTTimer bound
 > > to an HTNet object is dispatched. In that case
 > >     within the FlushEvent() in HTBufWrt.c  the return  value from the
 > > HTBufferWriterFlush() is NOT CHECKED AT ALL !
 >
 > Problably checked another way.

  Don't think so. Look at the comment in FlushEvent() (HTBufWrite.c) just
 before the HTWriter_write() call !

 Regards,
        Azzurra
Received on Tuesday, 18 December 2001 09:01:49 UTC