Re: Revisting the local-file-fetching hang

Hello Wayne,

> Even with the latest cvs source, I continue to experience a problem
> with lib-www hanging when fetching nested local files.  If you want
> to see the hang in action, run webbot with a depth of at least 2.
>
> % cd ~/src/libwww
> % webbot -depth 2 -prefix file:$PWD file:$PWD/Library/src/HTCache.html
>
> It should fetch 23 documents, however it stops after about 7 (with 6
> outstanding requests) and infinite loops.
>
> If I apply the age-old fix from Kinuko Yasuda, however, it works
> fine.  I have not attempted to look into the details of what is
> going on here, though, so is there a reason that this fix has not
> been committed?  Is it not quite right somehow?

Thanks for submitting an example where the Robot hangs. I was able to
work from it and from Kinuko's patch to (mostly) understand why we had
the problem and then solve it.

I hadn't applied Kinuko's patch before because, as I said at the time,
we were not sure if it was the best solution or if it'd break something.
Kinuko had based himself on the HTCache code and it was more a patch
by comparition than by analysis, but a very good step towards the solution.

After an extensive debugging session, I found out that the problem is
the messy libwwww function imbrication. In short, in this particular case,
libwww was starving itself on sockets without no reason. 

In long, requests were getting marked as pending and only released when
the previous local file access had ended... but this imbricates everything
into the function stack, as the release call was called before coming back
to the event handler.

Kinuko's solution works, but I confess I don't understand why we have to
initialize twice the host structure. That was what was happening when
we didn't set up the file->state variable. I suppose that the second time,
libwww has come out of the deep function stack and this has cleared up
things nicely :)

So, I added a new state in the FileEvent state machine called FD_PENDING,
to just take into account this special case. I found out that the other
things that were being called in the function were redundant the second
time we went through... and I fixed a memory leak problem that Kinuko's
patch had.

The final patch is now commited to CVS>

I ran the robot thru insure and didn't detect any problem, besides
a non-related 44 byte memory leak coming from HTTextImp_new(), which
I will kindly leave to someone else :)

Give it a try and if something else broke down, we'll see what to do then.

-Jose

Received on Thursday, 10 August 2000 12:17:27 UTC