[Prev][Next][Index][Thread]
Modifications of HTAnchor for libwww 4.0
-
To: www-lib-bugs@w3.org
-
Subject: Modifications of HTAnchor for libwww 4.0
-
From: Henrik Frystyk Nielsen <frystyk@w3.org>
-
Date: Mon, 20 May 1996 15:46:42 -0400
-
From frystyk@w3.org Mon May 20 15: 46:46 1996
-
Message-Id: <199605201946.PAA20530@www20.w3.org>
-
Reply-To: Henrik Frystyk Nielsen <frystyk@w3.org>
-
X-Mailer: exmh version 1.6.7 5/3/96
------- Forwarded Message
Return-Path: mikef@easy.regenisys.com
Return-Path: <mikef@easy.regenisys.com>
Received: from www10.w3.org by www18.w3.org (5.0/NSCS-1.0S)
id AA03041; Mon, 22 Apr 1996 20:36:58 +0500
Received: from scottsdale.regenisys.com by www10.w3.org (5.0/NSCS-1.0S)
id AA01926; Mon, 22 Apr 1996 20:36:55 +0500
Received: from recker.regenisys.com (recker.regenisys.com [204.212.61.5]) by
scottsdale.regenisys.com (8.6.12/8.6.9) with SMTP id RAA05427 for
<libwww@w3.org>; Mon, 22 Apr 1996 17:31:57 -0700
Received: from easy.regenisys.com by recker.regenisys.com
(5.x/SMI-SVR4.1.01d)
id AA27322; Mon, 22 Apr 1996 17:32:20 -0700
Received: by easy.regenisys.com (5.x/SMI-SVR4.1.03)
id AA13869; Mon, 22 Apr 1996 17:32:13 -0700
Date: Mon, 22 Apr 1996 17:32:13 -0700
From: mikef@easy.regenisys.com (Mike Farrar)
Message-Id: <9604230032.AA13869@easy.regenisys.com>
To: libwww@w3.org
Subject: Modifications of HTAnchor for libwww 4.0
X-Sun-Charset: US-ASCII
Content-Type: text
Content-Length: 2737
Folks,
We are using the CERN libraries to process machine generated HTML files.
These files can have thousands of anchors (some file have 15000+). The
transfer of these files was taking several minutes. Most the process
time was spent in HTAnchor_findChild in HTAnchor.c. The simple linked
list approach to store the anchors is very inefficient when there are
a large number of anchor.
Our solution was to build a hashed index into the anchor list. The anchor
tag is hashed and the value is used as an index into the linked list of
anchors. This greatly decreased the transfer time for files with a large
number of anchors.
Currently we are using version 4.0 of the CERN WWW libraries.
Enclosed is a diff of the changes made and the release to use them.
Michael Farrar
mikef@regenisys.com
P.S.
MIT is hereby permitted to distribute these contributions as part of its
W3C
software distributions at no cost to MIT or its licensed users.
I the undersigned represent that I either own the copyright, or have the
authority to grant these rights:
Name Michael Farrar
Title Software Engineer
Signature Michael Farrar
Date April 22, 1996
==========================================================================
diff -e -r Implementation/HTAncMan.h WWW/Library/Implementation/HTAncMan.h
121a
int tag_hash;
.
73a
HTList ** anchor_index;
.
diff -e -r Implementation/HTAnchor.c WWW/Library/Implementation/HTAnchor.c
678d
596a
FREE(parent->anchor_index);
.
546a
FREE(me->anchor_index);
.
545a
.
123c
.
104a
child->tag_hash = hash;
.
102c
if (!parent->anchor_index[hash]) {
HTList_addObject(parent->children, child);
*(parent->anchor_index+hash) = parent->children->next;
} else {
HTList_addObject(*(parent->anchor_index+hash), child);
}
/* HTList_addObject(parent->children, child); */
.
98a
parent->anchor_index = (HTList **) calloc (ANCHOR_SIZE, sizeof(HTList *));
if (!parent->anchor_index) outofmem(__FILE__, "HTAnchor_findChild");
}
.
97c
} else { /* Create new list of children */
.
94a
kids = kids->next;
.
87c
kids = parent->anchor_index[hash];
/* while ((child = (HTChildAnchor *) HTList_nextObject(kids))) {
*/
while (kids) {
child = (HTChildAnchor *) kids->object;
if (child->tag_hash != hash) break;
.
85c
if ((parent->children)) {
.
83a
/* hash the tag name */
if (tag && *tag)
for (p=tag; *p; p++)
hash = (int) ((hash * 13 + (*(unsigned char*)p)) %
ANCHOR_SIZE);
.
77c
.
74a
const char *p;
int hash = 0;
.
27a
#define ANCHOR_SIZE 1013
.
------- End of Forwarded Message
--
Henrik Frystyk Nielsen, <frystyk@w3.org>
World-Wide Web Consortium, MIT/LCS NE43-356
545 Technology Square, Cambridge MA 02139, USA