Re: libwww

>Need this be such a large task as it seems?


>These differences between browsers may look big, but in fact it
>only takes a few lines of code to adapt the libwww machinery to
>fit into them.
>If there are other problems then please list them here.

Perhaps I should be specific.  Here is an incomplete list of the
kinds of things we have changed in our 2.15-derived library:

Replace all memory allocation functions with macros which can
expand to something other than malloc on systems where malloc
is not usable (like the Mac).  This means using W3_MALLOC, W3_FREE,
W3_CALLOC and so on.

Use the same idea for all the sockets calls, which also vary on
non-UNIX platforms.

Change all the source files names to be legal with MSDOS 8.3 filename

Remove all the calls to fprintf(stderr, ...), which are not usable
for error reporting on Mac or Windows.

Add support for a different error reporting API which will work on
all platforms.  Something like ERR_ReportError(...) where the ERR_
API is implemented outside the library, probably in platform specific

Remove all uses of the outofmem() macro, which basically calls fprintf
and then exit()!  Commercial software requires much cleaner error
handling than this, particularly on Mac/Windows platforms.

Define an API for progress indicators, adding calls throughout the
library back into a platform-specific library of routines which keep
the user informed of when things are happening.  Our API includes
support for thermometers which show percentage completion of a task,
as well as a spinning NCSA-like globe which simply shows that something
is happening.  The actual presentation of the information could vary.
That's simply the way we implemented the API.  This same API polls
for user aborts, so all operations are abortable and the termination
of the network transfer (or whatever is happening) is handled cleanly.

Remove all calls to getenv(), which is not portable to anything but
UNIX.  Our library references an externally defined structure which
corresponds to user "Preferences".

The same goes for system().  The implementation of "external viewers"
is totally system dependent.

Remove/fix all the places where a static local variable is malloc-ed
but not free until the next time the function is executed.  For example:

        int foo(void)
                static char * mem;

                if (mem)
                        mem = NULL;
                mem = malloc(100);

                /* use mem for something, but don't free it */

On Windows, this causes a memory leak, since Windows platforms do
not release a process' memory when the process is killed.

Fix the other memory leaks, including the free-ing of the anchors,
atoms, suffix structures, and so on, which are allocated but never

Toss out HTHistory or rewrite it so it can be used with a multi-window

If HTML.c is to be shared (and it could be), remove its assumptions
about styles.  In fact, none of the styles stuff in the library was
useful for us.  HTML.c now references styles by integer index, not
by pointer, so the index can be used to find a style within a current
style sheet, and the style sheet can be swapped with another easily.

Don't define HTStream differently in multiple files.  Most debuggers
can't cope.

Rearrange the include files so they don't #include each other so

Add support for redirection and forms post.

Make MIME type matching case-insensitive, as per RFC 1521.

Cache the last call to gethostbyname()

In SGML.c, add support for capturing the HTML source as it comes through,
for supporting dialogs which allow the user to see the underlying HTML
behind a page.

#ifdef all the code which assumes all filenames are UNIX format.
These sections have to be rewritten for Mac and Windows.


Phew, that's an ugly list.  As I look back on it, I notice that some
of those things are not totally done yet.  Some of them are simply
bugs in the library which have been fixed in CERN's current releases.
Some of them are rather nitpicky things that we did just because one day
we got religious about some particular issue, like include files.

Nonetheless, this is the scope of the changes we've made, and most of
those changes were necessary.  Feeding those "changes" back to CERN
is certainly an option.  (In fact, some code has already been sent back
to CERN, so it is only a matter of time before those things are
integrated into the official CERN releases.)  But, right now, the
diffs from 2.15 would be larger than the library itself.

Another issue looming on the horizon is SECURITY.  If we have to integrate
S-HTTP into our libwww, the code will diverge even more.  Will CERN
want those kinds of changes too?

>Have we got contacts for EINET people? On the list?

John Hardin is involved with MacWeb, and he posts on the newsgroups
a lot.  I don't know if he's on the list or not.

>There is no need to repeat effort *if the changes are folded in*.

Agreed, but now that I've revealed the scale of the changes, I
suspect that you may not WANT all the changes we've made.  This is
not a matter of our reluctance to release the code.  The problem
is that our version of libwww really improves *our* situation, but it
may not improve CERN's.

>The problem with the NCSA Mosaic libwwws is that there was no folding
>in, and little effort to make the hooks needed fit in with a common
>library -- witness the lack of commonality even within NCSA.
>If CERN had had the manpower to go mine for the diffs and put them in
>retorospecively then theings might have been differnt, but it doesn't
>work unless there is some two-way communication: there are constraints
>form both the app side and the lib side, and these have to be discussed.

Also agreed.  But hindsight is 20-20, and looking back, this did not
happen like I hoped it would.  When we started work on Mosaic, we abandoned
NCSA's libwww and started with CERN's then-current 2.15.  We really wanted
more commonality with CERN and we wanted all our internal versions
to be the same.  I tried, from the beginning to minimize the changes
to libwww, and tried to make the hooks fit in with a common library.

I resolved to submit the diffs to CERN, but also to wait until I understood
more of the library before doing so.  I didn't want to burden the CERN
staff with my own lack of knowledge of the code.

The situation snuck up on me, and it got out of hand.  Before I knew it,
our library had so many changes, to submit the diffs would really be
asking a lot of the CERN staff.  Also, we had to add a number of "portable"
calls to "non-portable" code.  For example, our implementation of HTCopy()
has a call to WAIT_ComeUpForAir() for user progress indication and abort
polling.  I can't very well ask CERN to put stuff like that in the library
unless we're all going to agree that our WAIT_ API is the way to go,
and provide a sample implementation of it.  It would be arrogant to assume
that all libwww users will be so thrilled with our particular strategy
that it could be integrated into the library without discussion.

>CERN had manpower problems, but with W3O that will
>be relieved.  And our attitude has always been to fold in anything
>which people need (unless it it really dirty!) so that anyone who
>has helped us fold in things can take future versions with zero changes.

I hope that the above disclosure has been useful.  I remain motivated to
pursue collaboration on this library, but I think that simply mailing
an enormous diff to CERN would be rather unfair.  I believe that for
our participation in W30's development of this code to be most beneficial
for us and for others, we need a more proactive strategy, involving the
kind of two-way communication you speak of.

Feel free to correct me where my assessment of the situation is inaccurate.

Eric W. Sink, Software Engineer --  eric@spyglass.com 217-355-6000 ext 237
All opinions expressed are mine, and may not be those of my employer.
        "Only academic people put cheese in their pocket."
            -SW, 24 May 1994