W3C home > Mailing lists > Public > www-lib@w3.org > July to September 2005

Re: please help me with the libwww

From: Vic Bancroft <bancroft@america.net>
Date: Sun, 14 Aug 2005 08:50:11 -0400
Message-ID: <42FF3E03.1050107@america.net>
To: "ctorres@unsa.edu.pe" <ctorres@unsa.edu.pe>, libwww <www-lib@w3.org>

Aloha Cesar,

Often it is better to post to the entire list, you never know who is 
likely to have a handy answer !

In any case, I would approach your program with the starting point of 
head.c, "This small example sets up a request and performs a HEAD method 
on the resource provided on the command line",

  [bancroft@hilbert Examples]$ ./head 
http://www.w3.org/Library/Examples/head.c
  Looking up www.w3.org
  Looking up www.w3.org
  Contacting www.w3.org
  Reading...
  Done!
  Done!
  Load resulted in status 200

First checking for resource via the HEAD method allows one to use not 
only the server status code existence, e.g., 200  versus -404.  Also, if 
some records are kept of the archived files, say in a mysql database, 
one can compare expiration and modified dates, e.g.,

  MIME header. Expires: Sun, 14 Aug 2005 18:38:42 GMT
  MIME header. Last-Modified: Thu, 29 Jan 2004 13:50:09 GMT

more,
l8r,
v

ctorres@unsa.edu.pe wrote:

>Hello MR. Vic Bancroft,
>
>I am new using the libwww... and 
>I saw your question in the lib-www and I'd like to know if you could help me to
>solve some of my problems about connection..
>
>I did my program "prueba.c", this program is a modification of "libapp_4.c"
> and I don't know how to get review if a url exist or not.
> I tried using:
> 
> BOOL flag= HTLoadAbsolute (url,request);
> BOOL flag= HTLoadAnchor (anchor, request); 	and
> BOOL flag = HTLoadRules (url);
>
> //////////////////////////////////////////////////
> //	if(flag)	//	is url_exist
> //	   then save into a file
> /////////////////////////////////////////////
>
> 
> but none return a value different of "1" if a url doesn't exist.
> I need this because I want to save into a file the only the urls existing.
> ///////////////////////////////////////////////////////////////////////
> 
> On the other hand, I also modified the program "showlinks.c" to do a program
> "repetition.c" who receives many urls, 
> Could you show me, how can I get the my program doesn't stop after conect to
> the first url.
> 
> I'm looking foward your answer... please.
> 
> Cesar Torres.
> 
> PS.- Sorry for my english, It's no so good , I'm learnig yet :p.
> 
>  
> The files are :
> //////////////////////////////////////////////////////////////////////////
> //prueba.c
> #include "WWWLib.h"
> #include "WWWInit.h"
> 
> void principal (int argc, char ** argv)
> {
>     HTRequest * request;
>     HTProfile_newHTMLNoCacheClient ("Check_urls", "1.0");
>     request = HTRequest_new();
> 
>     if (argc == 3)
>     {
>     	char * url = argv[1];
> 	char * filename = argv[2];
> 
> 	url = HTParse(argv[1], NULL, PARSE_ALL);
> 	printf("\n Url Parseado -->  %s  \n",url);
> 	HTAnchor * anchor = NULL;
> 	anchor = HTAnchor_findAddress(url);
> 
> 	BOOL flag= HTLoadAbsolute (url,request);
> 	printf("The value of the flag>>%d\n", flag);
> 
> //	BOOL
> 	flag= HTLoadAnchor (anchor, request);
> 	printf("The value of the flag>>%d\n", flag);
> 
> //	BOOL
> 	flag = HTLoadRules (url);
> 	printf("The value of the flag>>%d\n", flag);
> 
> //////////////////////////////////////////////////
> //	if(flag)	//	is url_exist
> //	   then save into a file
> /////////////////////////////////////////////
>     }
>     else
>     {	printf("Type the URL  name of the local file to put it in\n");
> 	printf("\t%s <url> <filename>\n", argv[0]);
>     }
>     HTRequest_delete(request);		/* Delete the request object*/
>     //TProfile_delete();
> }
>  
>
> main(int argc, char **argv)
> {
> 	principal(argc,argv);
>	argv[1] = "http://www.xxxx.edu.pe";
> 	argv[2] = "xxxx.edu.pe.txt";
> 	principal(argc,argv);
> 
> 	argv[1] = "http://www.ucsm.edu.pe";
> 	argv[2] = "unsa.ucsm.pe.txt";
> 	principal(argc,argv);
> 
> 	argv[1] = "http://www.hotmail.com";
> 	argv[2] = "hotmail.txt";
> 	principal(argc,argv);
> 
> }
> //<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>//
> ///////////////////////////////////////////////////////////////////////
> /*start the test*/
> 
> Url Parseado -->  http://www.unsa.edu.pe/
> Looking up www.unsa.edu.pe
> Looking up www.unsa.edu.pe
> Contacting www.unsa.edu.pe
> The value of the flag>>1
> The value of the flag>>1
> The value of the flag>>1
> 
> Url Parseado -->  http://www.xxxx.edu.pe/
> Looking up www.xxxx.edu.pe
> Looking up www.xxxx.edu.pe
> Fatal Error: Can't locate remote host (www.xxxx.edu.pe)
> Reason: gethostbyname operation failed (Operation now in progress)
> The value of the flag>>1
> Looking up www.xxxx.edu.pe
> Looking up www.xxxx.edu.pe
> Fatal Error: Can't locate remote host (www.xxxx.edu.pe)
> Reason: gethostbyname operation failed (Operation now in progress)
> The value of the flag>>1
> Looking up www.xxxx.edu.pe
> Looking up www.xxxx.edu.pe
> Fatal Error: Can't locate remote host (www.xxxx.edu.pe)
> Reason: gethostbyname operation failed (Operation now in progress)
> The value of the flag>>1
> 
> Url Parseado -->  http://www.ucsm.edu.pe/
> Looking up www.ucsm.edu.pe
> Looking up www.ucsm.edu.pe
> Contacting www.ucsm.edu.pe
> The value of the flag>>1
> The value of the flag>>1
> The value of the flag>>1
>
> Url Parseado -->  http://www.hotmail.com/
> Looking up www.hotmail.com
> Looking up www.hotmail.com
> Contacting www.hotmail.com
> The value of the flag>>1
> The value of the flag>>1
> The value of the flag>>1
> 
> /*end of the test*/
> 
>/////////////////////////////////////////////////////////////////////////////
>///repetition.c
>
>*/
>
>#include "WWWLib.h"
>#include "WWWInit.h"
>#include "WWWHTML.h"
>
>PRIVATE int printer (const char * fmt, va_list pArgs)
>{
>    return (vfprintf(stdout, fmt, pArgs));
>}
>
>PRIVATE int tracer (const char * fmt, va_list pArgs)
>{
>    return (vfprintf(stderr, fmt, pArgs));
>}
>
>PRIVATE int terminate_handler (HTRequest * request, HTResponse * response,
>			       void * param, int status)
>{
>    /* We are done with this request */
>    HTRequest_delete(request);
>
>    /* Terminate libwww */
>    HTProfile_delete();
>
>    exit(0);
>}
>
>PRIVATE void foundLink (HText * text,
>			int element_number, int attribute_number,
>			HTChildAnchor * anchor,
>			const BOOL * present, const char ** value)
>{
>    if (anchor) {
>	/*
>	**  Find out which link we got. The anchor we are passed is
>	**  a child anchor of the anchor we are current parsing. We
>	**  have to go from this child anchor to the actual destination.
>	*/
>	HTAnchor * dest = HTAnchor_followMainLink((HTAnchor *) anchor);
>	char * address = HTAnchor_address(dest);
>	HTPrint("Found link `%s\'\n", address);
>//	HT_FREE(address);
>    }
>}
>
>principal (int argc, char ** argv)
>{
>    char * uri = NULL;
>
>    /* Create a new premptive client */
>    HTProfile_newHTMLNoCacheClient ("ShowLinks", "1.0");
>
>    /* Need our own trace and print functions */
>    HTPrint_setCallback(printer);
>    HTTrace_setCallback(tracer);
>
>    /* Set trace messages and alert messages */
>#if 0
>    HTSetTraceMessageMask("sop");
>#endif
>
>    /* Add our own termination filter */
>    HTNet_addAfter(terminate_handler, NULL, NULL, HT_ALL, HT_FILTER_LAST);
>
>    /*
>    ** Register our HTML link handler. We don't actually create a HText
>    ** object as this is not needed. We only register the specific link
>    ** callback.
>    */
>    HText_registerLinkCallback(foundLink);
>
>    /* Setup a timeout on the request for 15 secs */
>    HTHost_setEventTimeout(15000);
>
>    /* Handle command line args */
>    if (argc >= 2)
>	uri = HTParse(argv[1], NULL, PARSE_ALL);
>
>    if (uri) {
>	HTRequest * request = NULL;
>	HTAnchor * anchor = NULL;
>	BOOL status = NO;
>
>	/* Create a request */
>	request = HTRequest_new();
>
>	/* Get an anchor object for the URI */
>	anchor = HTAnchor_findAddress(uri);
>
>	/* Issue the GET and store the result in a chunk */
>	status = HTLoadAnchor(anchor, request);
>
>	/* Go into the event loop... */
>	if (status == YES) HTEventList_loop(request);
>
>    } else {
>	HTPrint("Type the URI to print out a list of embedded links\n");
>	HTPrint("\t%s <uri>\n", argv[0]);
>	HTPrint("For example:\n");
>	HTPrint("\t%s http://www.w3.org\n", argv[0]);
>    }
>
>    return 0;
>}
>
>main(int argc, char **argv)
> {
> 	principal(argc,argv);
>	argv[1] = "http://www.xxxx.edu.pe";
> 	argv[2] = "xxxx.edu.pe.txt";
> 	principal(argc,argv);
>
> 	argv[1] = "http://www.ucsm.edu.pe";
> 	argv[2] = "unsa.ucsm.pe.txt";
> 	principal(argc,argv);
>
> 	argv[1] = "http://www.hotmail.com";
> 	argv[2] = "hotmail.txt";
> 	principal(argc,argv);
>
> }
>///////////////////////////////////////////////
>//start test of the repetition.
>Looking up www.unsa.edu.pe
>Looking up www.unsa.edu.pe
>Contacting www.unsa.edu.pe
>Reading...
>Found link `http://www.unsa.edu.pe/men_sup.htm'
>Found link `http://www.unsa.edu.pe/noticias/principal.htm'
>Done!
>   
>
>
>
>  
>


-- 
"The future is here. It's just not evenly distributed yet."
 -- William Gibson, quoted by Whitfield Diffie
Received on Sunday, 14 August 2005 12:50:29 GMT

This archive was generated by hypermail 2.2.0+W3C-0.50 : Monday, 23 April 2007 18:18:45 GMT