- From: Dennis Gallagher <galron@seanet.com>
- Date: Tue, 20 Aug 1996 11:40:33 -0700
- To: "'WWW Library discussion group'" <www-lib@w3.org>
Hello to all. I've just joined this discussion group and I have some questions I hope someone might answer. I'm writing a Win32 program in C using VC++ 4.x which will download HTML pages into local buffers where I will parse data out of them. As part of this, I've read the HTTP 1.0 document from end to end. My first effort was successful. I was able to connect a TCP/IP port to a remote host and then to use the HEAD and GET methods to determine the web page's size and download it. My next effort was to try to access a web page behind an Basic Authentication barrier. Here I've run into problems that I don't understand. I send a request with the HEAD method and the remote responds that I am not authorized as expected. I then quiz my user for the ID and Password, encode them with the Base64 method and resubmit my request as a GET method with credentials attached. The remote responds with 200 (OK) and I read the page in. The problem is I always get only part of the page. On the original response from the server, he tells me how big the page will be as well as telling me I'm not authorized. It is this page size that is always short. Sometimes, it is only 200 to 300 bytes, others it is 1600 or so but always it is only part of what I'm expecting. Many times, after I've just received this partial page, I request the same page using NS or IE so I can look at the source and compare what I got vs what they got. Sometimes, it looks like I have exactly what they got but short. Other times, it looks like I have a page which is similar but not identical to what I expected. Needless to say, all of this is quite baffling. The site I'm trying to access provides real time stock quotes and the page I'm trying to download is: http://mw.dbc.com/cgi-bin/htx.exe/mw/main.html Is there something in this URL that might be a clue as to why I'm having problems? For awhile, I thought maybe my problem was that I was not escaping unacceptable characters in the URL but I'm doing it now and it has made no difference. This is getting long so I'll wrap up. My method is basically: connect to server escape unacceptable chars in the page path form HEAD request send HEAD request alloc small buffer for response read response check status code for OK or unauthorized if (unauthorized) get ID & password endif get page size from response free small buffer reconnect to server form GET request (with credentials if nec) send request alloc buffer for incoming page do first read check status code for OK if (bufferfull) exit endif loop read more page in (advancing pointers) if (charsread=0) exit endif endloop I realize that the libwww is supose to do much of this low level stuff for me but I had begun this project before I discovered it. I may switch over (I have some questions I'll post in a different message) but I'd like to know first why this isn't working. Thanks, Dennis Gallagher galron@seanet.com
Received on Tuesday, 20 August 1996 14:41:47 UTC