- From: Vic Bancroft <bancroft@zvolve.com>
- Date: Mon, 2 Dec 2002 10:53:17 -0500 (EST)
- To: Emmanuel Saracco <esaracco@noos.fr>
- cc: www-lib@w3.org
On Mon, 2 Dec 2002, Emmanuel Saracco wrote: > once again: could anybody send me a simple recursive uri load with depth > control using libwww? If you are looking for simple, why use libwww =;> [bancroft@res:/usr/local/src/w3c-libwww-5.4.0/Robot/src]$ grep HTQueue *.c HTQueue.c:** @(#) $Id: HTQueue.c,v 1.1 1998/10/26 22:45:34 frystyk Exp $ HTQueue.c:#include "HTQueue.h" HTQueue.c:HTList * HTQueue_new(void) HTQueue.c:BOOL HTQueue_delete(HTList *me) HTQueue.c:BOOL HTQueue_enqueue(HTList *me,void *newObject) HTQueue.c:BOOL HTQueue_append(HTList *me,void *newObject) HTQueue.c:BOOL HTQueue_dequeue(HTList *me) HTQueue.c:BOOL HTQueue_isEmpty(HTList *me) HTQueue.c:void * HTQueue_headOfQueue(HTList *me) HTQueue.c:int HTQueue_count(HTList *me) HTRobot.c:#include "HTQueue.h" HTRobot.c: me->queue = HTQueue_new(); HTRobot.c: if (mr->queue) HTQueue_delete(mr->queue); HTRobot.c: HTQueue_append(mr->queue, (void *) nhd); HTRobot.c: HTQueue_append(mr->queue, (void *)hd); (mr->cq)++; HTRobot.c: if(!HTQueue_isEmpty(mr->queue)) HTRobot.c: HyperDoc *nhd = (HyperDoc *)HTQueue_headOfQueue(mr->queue); HTRobot.c: HTQueue_dequeue(mr->queue); (mr->cq)--; HTRobot.c: HTQueue_enqueue(mr->queue, (void *) nhd); The basic idea is to queue urls that match the pattern rather than doing it recursively. This allows the robot to separate out the tasks of fetching urls, deciding whether to follow the links and actually processing the queue (all without having a runaway stack). Perhaps if you turn the HT_MYSQL definition on it will be easier to follow . . . more, l8r, ------------------------------------------------------------------- Victor Bancroft, Principal Engineer, Zvolve Systems [v]770.551.4505 1050 Crown Pointe Pkwy, Suite 300, Atlanta GA 30338 [f]770.551.4509 Fellow, Artificial Intelligence Center [v]706.542-0358 Athens, Georgia 30602, U.S.A http://ai.uga.edu/~bancroft
Received on Monday, 2 December 2002 10:54:10 UTC