Re: Broken pipes & lost requests from Michel Philip on 2001-12-20 (www-lib@w3.org from October to December 2001)

From: Michel Philip <philipm@altern.org>
Date: Thu, 20 Dec 2001 09:34:56 +0100
To: www-lib@w3.org
Cc: Azzurra Pantella <azzurra.pantella@netikos.com>
Message-ID: <3C21A2B0.FC98CC@altern.org>
I got the same problem and I have comment this in HTHost.

But back to Sven Laaks whet belongs to Sven Laaks!

see "Subject: Robot. Crash or memory problems after 12 hours"

When I read it's first mail I found it funny.

Few after the programs stop after exactly 12h.

I've writed to the list from work but my mail never been in the list.
I'can't anymore write in the list from work. Only from home.
Don't know why.
Sven was in copy. Don't know if he ever got my answer.

Here it is:

Hi Sven , hi all.
Sven Laaks wrote:
> I modified the example robot, so that it checks out a webpage 
> in given intervalls (e.g. every 10 minutes). 
> My first problem is, that the robot terminats after exactly 12h 
> with "segmentation fault (core dump)". 
> I work on a program that does the same thing but which havn't 
> been designed from the robot example.

I'm now tracking this 'exactly 12h' issue. 
[...]
The 12h is in the lib. It is in seconds unit. 
Search for 4320 in the source files and you will find what happens.
I gonna first try to disable this feature for I miss time.

Maybe add a 'HTHost_clearChannel()' when the host timeout expires 
could help.

> On other pages it runs 24h and more. 

It depends on whether the host is idle or not.
With multiple requests on an HTTP 1.1 host you can make it never idle 
and then you don't have this problem.

Michel. 


Azzurra Pantella wrote:
> 
>  ----- Original Message -----
>  From: "Michel Philip" <philipm@altern.org>
>  To: <www-lib@w3.org>
>  Sent: Tuesday, December 18, 2001 6:27 AM
>  Subject: Re: Broken pipes & lost requests
> 
>  >
>  > > Azzurra Pantella wrote:
>  > >
>  > > Hi all,
>  > > I am working on a robot-like client application using libwww to submit
>  > > big amounts of GET requests to the server side.
>  > > Time ago we came across a memory growth problem due to the HTAnchor
>  > > structures lifetime (once allocated they are NEVER freed till the end
>  > > of the program). Trying to solve this problem with a periodic "garbage
>  > > collection" of old
>  > > HtAnchor objects every N submitted requests, we noticed that the
>  > > library counter of active HTNet objects "HTNetCount" could
>  > > remain indefinitely postive as if some requests had got lost.
>  > > Consider that the HTNetCount is incremented every time an HTNet object
>  > > is added to the NetTable (Hash table containing all the HTNet) and
>  > > decremented when, after the reception of the response to the submitted
>  > > request, the matching HTNet object is deleted and removed from the
>  > > NetTable.
>  >
> 
>  > Then every time a request will be un-answered you miss to decremented
>  > the counter.
> 
>  Hi Michel,
>    I dare say this is not true because ususally there's a HTTimer in the
>  sockevent object that, in case of unanswered request and  after an
>  EventTimeout,  calls the HostEvent() (in HTHost.c) with type TIMEOUT.
>     This leads to a kill of the pipeline with subsequent call of the
>  HTNetDelete() function, scheduling of the     Terminate_handlers and at
> last
>  a decrement of the HTNetCount.
>  This is how the libww grants that any submitted request will in someways be
>  "answered".
>  I insist on saying that under certain conditions (frequent broken pipes),
>  when a broken pipe return state is  caught by the HTWriter_write() (in
>  HTWriter.c), the above mechanism doesn't work.
>  In such case the pipeline remains "unanswered" ( I suspect even subsequent
>  mismatched answers).
> 
>  > Register a user timer for each request you start. (b.e. 30 sec.)
>  > Remove the timer when you get the answer.
>  > NetKill the request if the timer trigger.
> 
>  We had been considering a similar solution but then what happens if an
>  answer (i.e. a late http responseto our request) arrives after the timout?
>  Which Net
>  object will it be associated to?
>  As far as I understand, the first answer in the incoming buffer (from Net)
> is
>  bound with the oldest object in pipeline and if we get more answers than
>  those expected then there won't be a correct  request-response match.
>  Don't you think so?
> 
>  > > We noticed that this requests loss took place only if there had
>  > > previously been some SIGPIPE signal reception while
>  > > writing to  the network (HTWriter_write() in HTWriter.c).
>  > > That led us to suspect the presence of a BUG as in the HTTP state
>  > > machine realized in the HTTPEvent() function ( HTTP.c module) the
>  > > reception of a SIGPIPE after a write does NOT cause a recovery. In
>  > > fact in case of broken pipe the returned value HT_CLOSED is never
>  > > checked. I suggest a behaviour similar to that after a HTHost_read
>  > > (l.1249 in HTTP.c, HTTPEvent() function): in case of HT_ERROR kill the
>  > > pipeline, in case of broken pipe (HT_CLOSED return value) try to
>  > > recover the pipeline.
>  > >
>  > > This is the bug-fix I'm proposing (HTTP.c line 1236):
>  > >
>  > > /* Now check the status code */
>  > > if (status == HT_WOULD_BLOCK)
>  > >     return HT_OK;
>  > > else if ( status == HT_PAUSE | | status == HT_LOADED) {
>  > >     type = HTEvent_READ;
>  > > } else if ( status == HT_ERROR) {
>  > >     http->state = HTTP_KILL_PIPE;
>  > > } else if ( status == HT_CLOSED )
>  > >     http->state = RECOVER_PIPE;
>  > >
>  > > instead of:
>  > >
>  > > /* Now check the status code */
>  > > if (status == HT_WOULD_BLOCK)
>  > >     return HT_OK;
>  > > else if ( status == HT_PAUSE | | status == HT_LOADED) {
>  > >     type = HTEvent_READ;
>  > > } else if ( status == HT_ERROR)
>  > >     http->state = HTTP_RECOVER_PIPE;
>  > >
>  > >
>  > >  And now 2 questions to the libww community:
>  > > 1) Is what I have noticed really a bug or is there any reason not to
>  > > recover after a sigpipe in the write branch ?
>  >
>  > No bug. (Or at least not here ;-)
>  > For me there is obvious reason not to recover when the stream have been
>  > closed:
>  > there is nothing to recover.
>  > Recovering is for pipeling and pipe is closed.
>  >
>  > If you start multiple request toward the same HTTP 1.1 host they will
>  > pipeline:
>  >
>  > out> GET req_1
>  > out> GET req_2
>  > out> ...
>  >
>  > in<  page_1
>  > in<  page_2
>  > in<  ...
>  >
>  > If you decide page_1 is too long to load you could decide to stop it.
>  > (register your own socket in the select to indicate this
>  >  because the lib is not multithread safe. or register a timer)
>  >
>  > When you kill the request (in fact the loading request of the Net
> > structure)
>  > recovering will stop loading page_1 but will not close the pipe and
>  > loading
>  > of page_2 will start whitout the need to repeat the resting requests
>  > 'cause
>  > the server already receive them
>  >
>  > out> GET req_2
>  > out> ...
>  >
>  In fact we start Multiple requests to the same HTTP1.1 host but, as I tried
>  to explain before, having the connection closed by the server side seems
>  sometimes to cause a certain number of request not to have their
>  terminate_handlers activated. You are probably right, it can't be called a
>  bug, but still there might be something undesired.
>  Let me explain why  we felt quite confident in trying to force the recovery
>  after a NETWRITE returning  with (socerrno == EPIPE).
>  In 1999 Olga Antropova suggested to treat a broken pipe in the write branch
>  just like in the read case
>  (cfr mailing list Mon. Aug 23 1999) and the proposed patch was accepted and
>  appears in the  last release.
>  But, whereas the HT_CLOSED return code from a HTHost_read() ( meaning
> broken
>  pipe) is checked in HTTPEvent() (HTTP.c) and causes a recovery,  when a
>  write issued in HTTPEvent returns HT_CLOSED
>  nothing is done to recover. We imagined that she meant to recover in both
>  cases.
> 
>  > > 2) The buffer toward the network is flushed also when an HTTimer bound
>  > > to an HTNet object is dispatched. In that case
>  > >     within the FlushEvent() in HTBufWrt.c  the return  value from the
>  > > HTBufferWriterFlush() is NOT CHECKED AT ALL !
>  >
>  > Problably checked another way.
> 
>   Don't think so. Look at the comment in FlushEvent() (HTBufWrite.c) just
>  before the HTWriter_write() call !
> 
>  Regards,
>         Azzurra
Received on Thursday, 20 December 2001 03:33:17 UTC