- From: John Kieti <jkieti@yahoo.com>
- Date: Tue, 22 May 2001 10:12:35 -0700 (PDT)
- To: www-lib@w3.org
Hi Am trying to collect and parse html using a pages mainly to collect forward links, link text and document titles. Am using a using a link callback function, textcallback function and two element callback functions- as seen below in my main parser fucntion. All seems to work fine until a page that is sort of not properly constructed (htmlwise) is reached.At such a point I get a segmentation fault. Is there any way I could recover from bad documents such as to ignore them, or does someone have a solution to the seg-fault problem. Below is my main function. (Am trying to avoid sending a very big email full of code - wil this be necessary? ) bool RobotDoc::parse(){ //Request declared as a member of RobotDoc _request = HTRequest_new(); //Register callback functions for extracting info HText_registerLinkCallback(foundlink); HText_registerTextCallback(foundtext); Text_registerElementCallback(beginElement,endElement); //To stop loop HTNet_addAfter(stoplinks, NULL, NULL,HT_ALL, _FILTER_LAST); //Load the document (_uri is a member of the class) BOOL status = HTLoadAbsolute(_url, _request); /* Go into the event loop... */ if(status = YES) HTEventList_loop(_request); HTRequest_setFlush(_request, YES); return ret; } Someone please assist me. Thanks Kieti __________________________________________________ Do You Yahoo!? Yahoo! Auctions - buy the things you want at great prices http://auctions.yahoo.com/
Received on Tuesday, 22 May 2001 13:12:42 UTC