- From: John Kieti <jkieti@yahoo.com>
- Date: Tue, 22 May 2001 10:12:35 -0700 (PDT)
- To: www-lib@w3.org
Hi
Am trying to collect and parse html using a pages
mainly to collect forward links, link text and
document titles.
Am using a using a link callback function,
textcallback function and two element callback
functions- as seen below in my main parser fucntion.
All seems to work fine until a page that is sort of
not properly constructed (htmlwise) is reached.At such
a point I get a segmentation fault. Is there any way I
could recover from bad documents such as to ignore
them, or does someone have a solution to the seg-fault
problem.
Below is my main function. (Am trying to avoid sending
a very big email full of code - wil this be necessary?
)
bool RobotDoc::parse(){
//Request declared as a member of RobotDoc
_request = HTRequest_new();
//Register callback functions for extracting info
HText_registerLinkCallback(foundlink);
HText_registerTextCallback(foundtext);
Text_registerElementCallback(beginElement,endElement);
//To stop loop
HTNet_addAfter(stoplinks, NULL, NULL,HT_ALL,
_FILTER_LAST);
//Load the document (_uri is a member of the class)
BOOL status = HTLoadAbsolute(_url, _request);
/* Go into the event loop... */
if(status = YES)
HTEventList_loop(_request);
HTRequest_setFlush(_request, YES);
return ret;
}
Someone please assist me.
Thanks Kieti
__________________________________________________
Do You Yahoo!?
Yahoo! Auctions - buy the things you want at great prices
http://auctions.yahoo.com/
Received on Tuesday, 22 May 2001 13:12:42 UTC