RE: NT SP6 and WouldBlock in HTTCP.C - Seemingly broken

Fred,

you are right, a new state is needed to generate a 100% clean solution.
Moving the select above "if (NETCALL_WOULDBLOCK(socerrno))" is a good idea.
However, have in mind, that socerrno is a macro calling WSAGetLastError(),
which would be the last error from the select() command. 

Regards,

Jens


-----Original Message-----
From: Fred Covely [mailto:fcovely@covely.com]
Sent: Donnerstag, 3. Mai 2001 21:03
To: Jens Meggers; www-lib@w3.org
Subject: RE: NT SP6 and WouldBlock in HTTCP.C - Seemingly broken


Jens:

Based on the MS site doc, I'd say that MAY get the job done, but maybe not.

Unless I am reading the doc wrong (which is quite possible), I think
the fundamental problem is that MS  doesn't want that connect call
going off more than once.  I'm worried that this may fix what we are
seeing now, but that another variation will turn up.

Here is the MS comment:
"
As a result, it is not recommended that applications use multiple calls to
connect to detect connection completion. If they do, they must be prepared
to handle WSAEINVAL and WSAEWOULDBLOCK error values the same way that they
handle WSAEALREADY, to assure robust execution.
"

What I like about your approach is using the select, rather than the connect
and the error code, to deduce what happened.  Using just your code, I'd move
the select logic above the
	if (NETCALL_WOULDBLOCK(socerrno))
line.  That way, any unsuccesful status out of connect would yield an
immediate
check as to completion via the select.  (FYI, for fast connections, that
will
actually improve performance).  For slow connections that will add some
overhead
because the select is called, rather than just the getLastError as is the
case
now.  But I think the reliability is more important.

Finally, in looking at this before, I feel that what would really be useful
is
a new state "TCP_CONNECTING", so you always issue the connect and then go to
connecting.  The select would then be put in the TCP_CONNECTING logic.  That
way the select is the do or die point.  From there you either up the state
to TCP_CONNECTED or back down to TCP_NEED_CONNECT on a failure.

All that being said, its quite possible your proposed fix will do the job
forever and ever.

Did that help?

Regards and thanks so much for the outstanding effort on this.

Fred Covely
fcovely@covely.com
(B)760-631-8157
(C)760-717-9689

-----Original Message-----
From: Jens Meggers [mailto:jens.meggers@firepad.com]
Sent: Thursday, May 03, 2001 6:09 PM
To: 'Fred Covely'; www-lib@w3.org
Subject: RE: NT SP6 and WouldBlock in HTTCP.C - Seemingly broken


Fred,

during my latest experiments, I finally found a NT 4.0 service pack 4
installation that caused the problem you described. After calling connect()
two times, it returns with an error, but WSAGetLast() error returns 0
altough the connection is already set up.
There is a quite clean workaround for that. We can use the select() command
to check whether there is a connection or not before throwing back an error
code. This can be done right before the "if (socerrno == EISCONN) {"
statement in the TCP_NEED_CONNECT state of HTDoConnect(). I've inserted the
following code:

#ifdef WWW_WIN_ASYNC
		// check if socket is connected. If yes, enter next stage
		{
		fd_set writefds;
		struct timeval timeout;
		int select_ret;

		FD_ZERO(&writefds);
		FD_SET(HTChannel_socket(host->channel), &writefds);
		timeout.tv_sec =  0;
		timeout.tv_usec = 0;
		select_ret = select ( 0,  NULL, &writefds,  NULL, &timeout
);
		if (select_ret == 1) {
		    host->tcpstate = TCP_CONNECTED;
		    HTTRACE(PROT_TRACE, "HTHost %p going to state
TCP_CONNECTED.\n" _ host);
		    break;
		}
		}
#endif


I also attached my HTTCP.c to this mail. Please not that it also continas
the code for checking the event error code.

What do you think?

Regard,

Jens



-----Original Message-----
From: Fred Covely [mailto:fcovely@covely.com]
Sent: Dienstag, 24. April 2001 17:29
To: Jens Meggers; www-lib@w3.org
Subject: RE: NT SP6 and WouldBlock in HTTCP.C - Seemingly broken


Congratulations, thats the fastest bug fix I ever saw (8>).

I'm real curious as to your solution.

I'm not seeing any issues with this on win2k, only NT.  But the
user I have on NT SP6 fails hard everytime.  I think I'll go
ahead and set up an NT/SP6 box and see if I can reproduce
with and without your patch when it comes out.

regards,

Fred Covely
fcovely@covely.com
(B)760-631-8157
(C)760-717-9689

-----Original Message-----
From: Jens Meggers [mailto:jens.meggers@firepad.com]
Sent: Tuesday, April 24, 2001 5:08 PM
To: 'Fred Covely'; www-lib@w3.org
Subject: RE: NT SP6 and WouldBlock in HTTCP.C - Seemingly broken


Fred,

I solved the problem within the last weeks. I had to pass the error message
that the asyn event messages of the socket call is carrying with the event
object to the HTDoConnect method. Actually, it works. Unfortunatly, a lot of
files are involved. I will send a patch description asap.

Jens


-----Original Message-----
From: Fred Covely [mailto:fcovely@covely.com]
Sent: Dienstag, 24. April 2001 17:09
To: www-lib@w3.org
Subject: NT SP6 and WouldBlock in HTTCP.C - Seemingly broken


I have run into an interesting problem on NT SP 6 on at
least one machine.  I've done a detailed trace on the box
that is failing and on several Win2K boxes that work.

Clearly based on the source comments there has been a lot
of work done in this area, so I would request your input
on this problem.

Here is the scenario:

We are doing a simple httpget on a public site (yahoo.com, or whatever).
On a win2K machine the sequence in HTTCP.C looks like this:

1224  15:50:44        Event....... Registering socket for HTEvent_CONNECT
1224  15:50:44        HTDoConnect. rcode `10035'
1224  15:50:44        HTDoConnect. status `-1'
1224  15:50:44        HTDoConnect. WOULD BLOCK `www.yahoo.com'
1224  15:50:44      Host Event.. WRITE passed to `http://www.yahoo.com'
1224  15:50:44      HTDoConnect. rcode `10056'
1224  15:50:44      HTDoConnect. status `-1'
1224  15:50:44      HTHost 01099988 going to state TCP_CONNECTED.
1224  15:50:44      Event....... Socket 456 unregistered for HTEvent_CONNECT
...
1224  15:50:44      DNS weight.. Home 5 has weight 0.00
1224  15:50:44      HTHost 01099988 connected.
1224  15:50:44      Host connect Unlocking Host 01099988
1224  15:50:44      StreamStack. Constructing stream stack for text/x-http
to */*

The Win2K request then proceeds normally.

On the NT SP 6 machine the same request looks like this:

286  14:40:54        Event....... Register socket 248, request 0082B640
...
286  14:40:54        HTDoConnect. rcode `10035'
286  14:40:54        HTDoConnect. status `-1'
286  14:40:54        HTDoConnect. WOULD BLOCK `www.yahoo.com'
286  14:41:39      Host Event.. WRITE passed to `http://www.yahoo.com'
286  14:41:39      HTDoConnect. rcode `10035'
286  14:41:39      HTDoConnect. status `-1'
286  14:41:39      HTDoConnect. WOULD BLOCK `www.yahoo.com'
286  14:42:24      Host Event.. WRITE passed to `http://www.yahoo.com'
286  14:42:24      HTDoConnect. rcode `10035'
286  14:42:24      HTDoConnect. status `-1'
286  14:42:24      HTDoConnect. WOULD BLOCK `www.yahoo.com'
286  14:43:09      Host Event.. WRITE passed to `http://www.yahoo.com'
286  14:43:09      HTDoConnect. rcode `10035'
286  14:43:09      HTDoConnect. status `-1'

In inspecting the Microsoft web site on the connect, they clearly
state that the preferred implementation is not to use multiple
connect calls.  I don't have enough familiarity with the libwww
code to venture a guess as to what is wrong.  Could it be related
to the multiple connect strategy?  If so has anyone looked at
the best way to do this?  I see a comment of about a year ago
from jens@meggers.com indicating there were known ms problems
in this area.  I have a hard time believing someone has not
figured out how to do an absolutely bullet proof connect in windows.

Any input greatly appreciated.


Fred Covely
fcovely@covely.com
(B)760-631-8157
(C)760-717-9689

Received on Friday, 4 May 2001 11:32:13 UTC