- From: Henrik Frystyk Nielsen <frystyk@w3.org>
- Date: Mon, 20 May 1996 15:46:42 -0400
- To: www-lib-bugs@w3.org
------- Forwarded Message Return-Path: mikef@easy.regenisys.com Return-Path: <mikef@easy.regenisys.com> Received: from www10.w3.org by www18.w3.org (5.0/NSCS-1.0S) id AA03041; Mon, 22 Apr 1996 20:36:58 +0500 Received: from scottsdale.regenisys.com by www10.w3.org (5.0/NSCS-1.0S) id AA01926; Mon, 22 Apr 1996 20:36:55 +0500 Received: from recker.regenisys.com (recker.regenisys.com [204.212.61.5]) by scottsdale.regenisys.com (8.6.12/8.6.9) with SMTP id RAA05427 for <libwww@w3.org>; Mon, 22 Apr 1996 17:31:57 -0700 Received: from easy.regenisys.com by recker.regenisys.com (5.x/SMI-SVR4.1.01d) id AA27322; Mon, 22 Apr 1996 17:32:20 -0700 Received: by easy.regenisys.com (5.x/SMI-SVR4.1.03) id AA13869; Mon, 22 Apr 1996 17:32:13 -0700 Date: Mon, 22 Apr 1996 17:32:13 -0700 From: mikef@easy.regenisys.com (Mike Farrar) Message-Id: <9604230032.AA13869@easy.regenisys.com> To: libwww@w3.org Subject: Modifications of HTAnchor for libwww 4.0 X-Sun-Charset: US-ASCII Content-Type: text Content-Length: 2737 Folks, We are using the CERN libraries to process machine generated HTML files. These files can have thousands of anchors (some file have 15000+). The transfer of these files was taking several minutes. Most the process time was spent in HTAnchor_findChild in HTAnchor.c. The simple linked list approach to store the anchors is very inefficient when there are a large number of anchor. Our solution was to build a hashed index into the anchor list. The anchor tag is hashed and the value is used as an index into the linked list of anchors. This greatly decreased the transfer time for files with a large number of anchors. Currently we are using version 4.0 of the CERN WWW libraries. Enclosed is a diff of the changes made and the release to use them. Michael Farrar mikef@regenisys.com P.S. MIT is hereby permitted to distribute these contributions as part of its W3C software distributions at no cost to MIT or its licensed users. I the undersigned represent that I either own the copyright, or have the authority to grant these rights: Name Michael Farrar Title Software Engineer Signature Michael Farrar Date April 22, 1996 ========================================================================== diff -e -r Implementation/HTAncMan.h WWW/Library/Implementation/HTAncMan.h 121a int tag_hash; . 73a HTList ** anchor_index; . diff -e -r Implementation/HTAnchor.c WWW/Library/Implementation/HTAnchor.c 678d 596a FREE(parent->anchor_index); . 546a FREE(me->anchor_index); . 545a . 123c . 104a child->tag_hash = hash; . 102c if (!parent->anchor_index[hash]) { HTList_addObject(parent->children, child); *(parent->anchor_index+hash) = parent->children->next; } else { HTList_addObject(*(parent->anchor_index+hash), child); } /* HTList_addObject(parent->children, child); */ . 98a parent->anchor_index = (HTList **) calloc (ANCHOR_SIZE, sizeof(HTList *)); if (!parent->anchor_index) outofmem(__FILE__, "HTAnchor_findChild"); } . 97c } else { /* Create new list of children */ . 94a kids = kids->next; . 87c kids = parent->anchor_index[hash]; /* while ((child = (HTChildAnchor *) HTList_nextObject(kids))) { */ while (kids) { child = (HTChildAnchor *) kids->object; if (child->tag_hash != hash) break; . 85c if ((parent->children)) { . 83a /* hash the tag name */ if (tag && *tag) for (p=tag; *p; p++) hash = (int) ((hash * 13 + (*(unsigned char*)p)) % ANCHOR_SIZE); . 77c . 74a const char *p; int hash = 0; . 27a #define ANCHOR_SIZE 1013 . ------- End of Forwarded Message -- Henrik Frystyk Nielsen, <frystyk@w3.org> World-Wide Web Consortium, MIT/LCS NE43-356 545 Technology Square, Cambridge MA 02139, USA
Received on Monday, 20 May 1996 15:46:46 UTC