Re: Broken pipes & lost requests from Michel Philip on 2001-12-18 (www-lib@w3.org from October to December 2001)

From: Michel Philip <philipm@altern.org>
Date: Tue, 18 Dec 2001 06:23:21 +0100
To: www-lib@w3.org
Message-ID: <3C1ED2C9.A1A5D286@altern.org>
Nelson Spessard wrote:
> 
> Ok thanks for getting back to me.  Let me tell you
> what I am seeing and what I am trying to do...
> I believe that I have identified part of my problem
> but not completely all of the issues.
> 
> The Segmentation fault is occurring in
> terminate_handler.  (As I am new to this library I am
> using the Robot as a model....)  I am checking several
> queues for processing within the system.  ( unless
> someone can tell me how to redirect to a file at the
> same time as to the parser...)
> One of the queues calls HTLoad_toFile
> One of the Queues will recreate a new Anchor to walk
> One of the queues handles the parsing of the requests
> as they exist from the main loop.  It is in no way
> elegant but I am very pressed to complete this.

I'm affraid you will very hardly success this way.
You can't have one thread doing HTLoad_toFile and another "new Anchor".
Or maybe I didn't understand well. Are you single thread and "polling"
not blcking queues?

The lib have been designed such you don't need multithread.
What are you parsing from the requests?
The lib contains parsing modules...

"parsing of the requests as they exist from the main loop."

Wich main loop... The lib one? And you enter again the loop after.
If yes then I'm not sure this is a good way.
For what I understand a program need just one call to the main loop
and proceed the requests in the terminate_handler.

If, to dispatch cpu usage, you need multiple threads or multiple process
there is only one way:
- register your own inter-thread or inter-proccess communication socket
  in the Event list and forwrad the data from the requests via this
socket
  using the read handler (callback).

This makes sense only if you have big processing or IO(s) to do during
your "parsing".
Else with today cpu the benefit will be null (or negative) 

> As I see it the segmentation fault occurs when 2
> incoming streams complete at the same time.  And there
> is only one entry in the queue.
> Walla...I have a race condition....
> The first tests the queue and removes an entry.
> The second one comes in behind ... tests the
> queue...still positive... removes entry....bang.
> Well As I am getting back into C programming after a
> hiatus I am trying to figure out a way to clear this
> queue. A basic integer flag does not work (it appears
> that the parse routine can interject and timeout
> preventing links from being added).. perhaps a
> semiphore or a mutex the implementation of how I can
> do this is eluding me I think the former might work
> but need to see if the blocking will interfere with
> the eventloop....  Any Ideas anyone? I'm sure someone
> has done something like this.

If you use mutex you will really loose performance.
I believe the lib have been designed 'pseudo multithread' 
using select as the base of synchronization for efficiency reason.
Synchronization matche the IOs, and only the IOs.

> As to the lost requests.  Yes I do see it as I
> described earlier. Usually the system will hang.  It
> appears to be when 2 requests are being made to the
> same URI.  when one completes all of the timers appear
> to be cleared.  need to investigate this further.  

Investigate about pipelining in HTTP 1.1.

> As I am for now reusing the robot code I am not sure why
> this is occurring at this time.  I am not creating
> timers of my own at this time.

> As to the memory leak I have not seen a significant
> one at this time...however I can maybe get 200
> requests processed before the system either hangs
> without timers or dies from the sigfault.  However the
> HTAnchor problem makes sense to me from what I have
> seen...Something I will be aware of.

If you specify "text/plain" as format for the request
the memory leak is small. 
There is leak with redirected pages, in particular... :-)

If you use "text/html" the pages are parsed and 
child anchors are created for the set of links in the page.
(each different "<a>" tag)
Then the leak is really obvious.


Michel.
Received on Tuesday, 18 December 2001 00:21:47 UTC