Re: libwww

I can add some comments about the current state of the library:

> Replace all memory allocation functions with macros which can
> expand to something other than malloc on systems where malloc
> is not usable (like the Mac).  This means using W3_MALLOC, W3_FREE,
> W3_CALLOC and so on.

This is still not done

> Use the same idea for all the sockets calls, which also vary on
> non-UNIX platforms.

All accept and connect() calls now go through a single function in
HTTCP.c READ, WRITE and CLOSE are used througout the library but are
macros, so they should be `transformable'

> Change all the source files names to be legal with MSDOS 8.3 filename
> limitations.

This should be a minor task...
 
> Remove all the calls to fprintf(stderr, ...), which are not usable
> for error reporting on Mac or Windows.

This is more serious as there are 1.000.000's of them right now. However,
again I think a macro substitution would solve the problem

> Add support for a different error reporting API which will work on
> all platforms.  Something like ERR_ReportError(...) where the ERR_
> API is implemented outside the library, probably in platform specific
> code.

Is this the fprintf(stderr problem??? Have you seen the new error/information
parsing module that gives the user information on what's going on?

> Remove all uses of the outofmem() macro, which basically calls fprintf
> and then exit()!  Commercial software requires much cleaner error
> handling than this, particularly on Mac/Windows platforms.

Right! - the library is basically foreseen to work on a platform with
lots of memory...

> Define an API for progress indicators, adding calls throughout the
> library back into a platform-specific library of routines which keep
> the user informed of when things are happening.  Our API includes
> support for thermometers which show percentage completion of a task,
> as well as a spinning NCSA-like globe which simply shows that something
> is happening.  The actual presentation of the information could vary.
> That's simply the way we implemented the API.  This same API polls
> for user aborts, so all operations are abortable and the termination
> of the network transfer (or whatever is happening) is handled cleanly.

This is on my working list and should be easy to do. Basically it would
be to add HTProgress() throughout the library.

> Remove all calls to getenv(), which is not portable to anything but
> UNIX.  Our library references an externally defined structure which
> corresponds to user "Preferences".
> 
> The same goes for system().  The implementation of "external viewers"
> is totally system dependent.

Definitely, we have a problem with system calls. It could be solved using
#ifdef's but I don't think this is satisfactory for DOS-people. I am not sure
what to do..

> Remove/fix all the places where a static local variable is malloc-ed
> but not free until the next time the function is executed.  For example:
> 
>         int foo(void)
>         {
>                 static char * mem;
> 
>                 if (mem)
>                 {
>                         free(mem);
>                         mem = NULL;
>                 }
>                 mem = malloc(100);
> 
>                 /* use mem for something, but don't free it */
>         }
> 
> On Windows, this causes a memory leak, since Windows platforms do
> not release a process' memory when the process is killed.

I was not aware of this problem... sure it would take some work, but it
is possible...

> Fix the other memory leaks, including the free-ing of the anchors,
> atoms, suffix structures, and so on, which are allocated but never
> released.

Again - on UNIX they are not really `leaks' ;-)

> Toss out HTHistory or rewrite it so it can be used with a multi-window
> browser.

Remember it should also be used on a character based, single window
implementation, like the Line Mode Browser. 

> If HTML.c is to be shared (and it could be), remove its assumptions
> about styles.  In fact, none of the styles stuff in the library was
> useful for us.  HTML.c now references styles by integer index, not
> by pointer, so the index can be used to find a style within a current
> style sheet, and the style sheet can be swapped with another easily.
> 
> Don't define HTStream differently in multiple files.  Most debuggers
> can't cope.
> 
> Rearrange the include files so they don't #include each other so
> often.
> 
> Add support for redirection and forms post.

You mean clean up the code a bit...

> Make MIME type matching case-insensitive, as per RFC 1521.

We have a new and better MIME parser on the working list. For the moment
there are 1.000's of MIME-parsers in the library. One general parser
would be nice. Then optimizations like case insensitivity also become
more appropriate.

> Cache the last call to gethostbyname()

This is done in the 2.16 version. It also supports multi homed hosts and on
each connect it measures the connect time on a given IP-address. The next
connect then takes the fastest IP-address.

> In SGML.c, add support for capturing the HTML source as it comes through,
> for supporting dialogs which allow the user to see the underlying HTML
> behind a page.

The SGML/HTML needs to be looked at. It should also be upgrated to HTML+

> #ifdef all the code which assumes all filenames are UNIX format.
> These sections have to be rewritten for Mac and Windows.

The library hasn't been compiled on a PC/Mac for a long time. A general
port is necessary.

> ----------
> 
> Phew, that's an ugly list.  As I look back on it, I notice that some
> of those things are not totally done yet.  Some of them are simply
> bugs in the library which have been fixed in CERN's current releases.
> Some of them are rather nitpicky things that we did just because one day
> we got religious about some particular issue, like include files.
> 
> Nonetheless, this is the scope of the changes we've made, and most of
> those changes were necessary.  Feeding those "changes" back to CERN
> is certainly an option.  (In fact, some code has already been sent back
> to CERN, so it is only a matter of time before those things are
> integrated into the official CERN releases.)  But, right now, the
> diffs from 2.15 would be larger than the library itself.

Quite a lot has happened from version 2.15 to the current version. I
don't expect that people just throw their version of the library away
and starts using the current CERN version. It sure has a lack of
functionality! My idea with this mailing list is to start the process
of converging the different versions. In your list above I find nothing
that can't be done - and it should only be done once in order to work.
That is in otehr words, I am very interested in getting response on new
features and diffs.

In response I hope that some of the new features that now are
implemented in the library makes it attractive for you to use as a
basis for new development.

You are right about the difficulties supporting all platforms. In
practice I think it is unrealistic. However, I believe that all
platforms have a common core of functionality that can be shared
between them, but this requires that the library stays as general as
possible and doesn't try to do any fancy things.

-- cheers --

Henrik Frystyk

Received on Wednesday, 13 July 1994 19:31:08 UTC