Modifications of HTAnchor for libwww 4.0

------- Forwarded Message

Return-Path: mikef@easy.regenisys.com 
Return-Path: <mikef@easy.regenisys.com>
Received: from www10.w3.org by www18.w3.org (5.0/NSCS-1.0S) 
	id AA03041; Mon, 22 Apr 1996 20:36:58 +0500
Received: from scottsdale.regenisys.com by www10.w3.org (5.0/NSCS-1.0S) 
	id AA01926; Mon, 22 Apr 1996 20:36:55 +0500
Received: from recker.regenisys.com (recker.regenisys.com [204.212.61.5]) by 
scottsdale.regenisys.com (8.6.12/8.6.9) with SMTP id RAA05427 for 
<libwww@w3.org>; Mon, 22 Apr 1996 17:31:57 -0700
Received: from easy.regenisys.com by recker.regenisys.com 
(5.x/SMI-SVR4.1.01d)
	id AA27322; Mon, 22 Apr 1996 17:32:20 -0700
Received: by easy.regenisys.com (5.x/SMI-SVR4.1.03)
	id AA13869; Mon, 22 Apr 1996 17:32:13 -0700
Date: Mon, 22 Apr 1996 17:32:13 -0700
From: mikef@easy.regenisys.com (Mike Farrar)
Message-Id: <9604230032.AA13869@easy.regenisys.com>
To: libwww@w3.org
Subject: Modifications of HTAnchor for libwww 4.0
X-Sun-Charset: US-ASCII
Content-Type: text
Content-Length: 2737

Folks,

We are using the CERN libraries to process machine generated HTML files.
These files can have thousands of anchors (some file have 15000+).  The
transfer of these files was taking several minutes.  Most the process
time was spent in HTAnchor_findChild in HTAnchor.c.  The simple linked
list approach to store the anchors is very inefficient when there are
a large number of anchor.

Our solution was to build a hashed index into the anchor list.  The anchor
tag is hashed and the value is used as an index into the linked list of
anchors.  This greatly decreased the transfer time for files with a large
number of anchors.

Currently we are using version 4.0 of the CERN WWW libraries.

Enclosed is a diff of the changes made and the release to use them.

Michael Farrar
mikef@regenisys.com

P.S.

   MIT is hereby permitted to distribute these contributions as part of its 
W3C
   software distributions at no cost to MIT or its licensed users.
   
   I the undersigned represent that I either own the copyright, or have the
   authority to grant these rights:
   
Name            Michael Farrar
Title           Software Engineer

Signature       Michael Farrar
Date            April 22, 1996

 
==========================================================================

diff -e -r Implementation/HTAncMan.h WWW/Library/Implementation/HTAncMan.h
121a
  int           tag_hash;
.
73a
  HTList **     anchor_index;

.
diff -e -r Implementation/HTAnchor.c WWW/Library/Implementation/HTAnchor.c
678d
596a
    FREE(parent->anchor_index);
.
546a
    FREE(me->anchor_index);

.
545a

.
123c

.
104a
    child->tag_hash = hash;
.
102c
    if (!parent->anchor_index[hash]) {
       HTList_addObject(parent->children, child);
       *(parent->anchor_index+hash) = parent->children->next;
    } else {
       HTList_addObject(*(parent->anchor_index+hash), child);
    }
/*    HTList_addObject(parent->children, child); */
.
98a
	parent->anchor_index = (HTList **) calloc (ANCHOR_SIZE, sizeof(HTList *));
	if (!parent->anchor_index) outofmem(__FILE__, "HTAnchor_findChild");
        
    }
.
97c
    } else {				      /* Create new list of children */
.
94a
                kids = kids->next;
.
87c
            kids = parent->anchor_index[hash];
            /* while ((child = (HTChildAnchor *) HTList_nextObject(kids))) { 
*/
	    while (kids) {
                child = (HTChildAnchor *) kids->object;
                if (child->tag_hash != hash) break;
.
85c
    if ((parent->children)) {
.
83a
    /* hash the tag name */
    if (tag && *tag)
        for (p=tag; *p; p++)
               hash = (int) ((hash * 13 + (*(unsigned char*)p)) % 
ANCHOR_SIZE);

.
77c

.
74a
    const char *p;
    int  hash = 0;

.
27a
#define ANCHOR_SIZE 1013
.

------- End of Forwarded Message


-- 
Henrik Frystyk Nielsen, <frystyk@w3.org>
World-Wide Web Consortium, MIT/LCS NE43-356
545 Technology Square, Cambridge MA 02139, USA

Received on Monday, 20 May 1996 15:46:46 UTC