CERN Common World-Wide Web Library Version 2.16pre1 Available from Henrik Frystyk Nielsen on 1994-06-17 (www-lib@w3.org from April to June 1994)

From: Henrik Frystyk Nielsen <frystyk@ptsun00.cern.ch>
Date: Fri, 17 Jun 94 02:35:03 +0200
To: www-annouce@www0.cern.ch, www-lib@www0.cern.ch
Message-Id: <9406170035.AA01609@ptsun03.cern.ch>
			   *  *  *  *  *

  The CERN Common WWW Library is a general code base that can be used
  to build clients and servers. It contains code for accessing HTTP, FTP,
  Gopher, News, WAIS, Telnet servers, and the local file system.
  Furthermore it provides modules for parsing, managing and presenting
  hypertext objects to the user and a wide spectra of generic programming
  utilities.

			   *  *  *  *  *

CERN Common Code Library 2.16pre1 is available, source code:

	ftp://info.cern.ch/pub/www/src/WWWLibrary_2.16.tar.Z

Its is known to compile on Sun4, Solaris, HP, NeXT, NeXT-386,
Decstation Ultrix and DEC OSF/1.

Diffs and old versions are available at

	ftp://info.cern.ch/pub/www/src/old
	ftp://info.cern.ch/pub/www/src/diffs

Documentation is available at 

	http://info.cern.ch/hypertext/WWW/Library/Status.html and
	http://info.cern.ch/hypertext/WWW/Library/User/Guide.html

Programmer's Guide is available at

	http://info.cern.ch/hypertext/WWW/Library/Implementation/Overview.html

The current address to send email about CERN Library is:

	www-bug@info.cern.ch

- - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - -

CERN Library 2.16 Prerelease Notes

New Features and Changed Interfaces

HTTP Client

HTTP module contains the code for the HTTP client. The module is now
reorganized and made more modular. 

Automatic Redirection 
   Now supported by the HTTP Module. The name of the new URL is parsed
   to the client via the error_stack as a ERR_INFO message, see
   HTError module. The maximum number of redirections is set by the
   variable HTMaxRedirections. 

Referer Field in HTTP request 
   Clients are provided the possibility of sending a Referer Field in
   a HTTP Request. This is done by filling out the HTRequest->parentAnchor
   field. 

>From field in HTTP Request 
   Clients can now send the full email address of the current user in
   the HTTP From field. The feature is turned off by default as it
   might get  a bit tricky through a Proxy. 

204 Response 
   Support of return code `204 No Response' 

FTP Client

HTFTP module contains the code for the FTP client. The FTP client has
changed  a lot in this release. It is now a complete state machine
where the actual action executed is a function of the current state. 

The client now follows the suggestions given in rfc 1123: "Requirements
for Internet Hosts -- Application and Support". 

Establishment of the data connection now comply to rfc 1579:
"Firewall-friendly FTP" such that the procedure is 

    1. try PASV 
    2. if that fails, try PORT 

The URL is now parsed according to the (latest) specifications: 

           url     : f t p: / / login / path [  ftptype ]
           login   : [ user [ : password ] @ ] hostport
           hostport: host [ : port ]  
           ftptype : A formcode | E formcode | I | L digits
           formcode:  N | T | C

Both directory listings and file retrieval use the same procedure: 

    1. First try to go to the location directly, as we are often
       talking to a UNIX server or one that 'understands' UNIX syntax 
    2. If it fails, then go to the location step by step using CWD. In
       that way we should not have any problems on any platform, and
       thus it is not necessary to make special hacks for VMS, etc. 

Long directory listings are supported for unix-like systems and VMS.
This includes NetWare and WindowsNT. See Future plans for more and
Directory Listings 

Information from the FTP-server is pr default presented to the client
using the following rule: 

    1. If you are connecting to the root directory at a ftp site, we
       show the 'login' message (might be a concatenation of several
       messages) just like in a normal ftp session. 
    2. If you have a more specific URL, then you probably already know
       the site and are less interested in the login message. Instead
       we show any local message when making a CWD to the right location. 

Gopher Client

The Gopher has been revised and improved error handling has been implemented. 

Information Messages 
   Some Gopher servers send back information messages in a line containing
   "error.host". This information is treated like login information
   from FTP servers so that it is represented as a message before or after
   the actual listing. 

Iconized Listings 
   Listings now contain icons in the same way as the other listings. 

CSO Name Server 
   The CSO Name Server client outputs in HTML and not only <PRE> as before. 

Content Type Recognition 
   The Gopher module uses it's own content-type recognition inherited
   from HTTP when handling gopher text and gopher binary files. This
   means,  that e.g. PostScript files get handled correctly. 

Local File Access

The new version of HTFile module is a lot smaller as all Directory
listing stuff has moved to HTDirBrw module. New error handling has
been implemented. 

Passive and Active Connection Establishment

Calls to connect() and accept() now go through the functions
HTDoConnect() and  HTDoAccept() respectively. 

Cache of Host Names and Addresses

HTInetParse() that is called from within HTDoConnect now has an
internal cache of the names and (possible multiple) IP-addresses of
visited hosts. This minimizes the access to the file /etc/hosts and
the Domain  Name Server, even though aliases are not recognized in the cache.

The default cache size is 500 entries and a host stays as long as a
connect()  succeeds. That is, if connection is refused for some
reason, the host is taken out of the cache. 

The time to make a connection to a multihomed host is measured every
time and  a mean access time is calculated so that HTDoConnect always
takes the fastest IP-address, see Future plans. 

Improved Functionality of DNS requests

The Library now provides functionality for obtaining the full mail
address of the user, full domain name of the host and also the
possibility for setting both values. This means that the user can use
his official email address, e.g. in the HTTP request. 

Long Directory Listings

Long directory listings for HTTP, FTP and files on the local file
system  supported. For the moment only a part of the functionality,
e.g, sorting, which columns to show etc. is exploited, see Future Plans. 

Icon Management

Icons in directory listings are bound to MIME content-types and
encoding.  They can be found in the HTIcons module. The default set of
icons is set up using HTStdIconInit() and new icons can be added
dynamicly  using HTAddIcon(). 

File Descriptions in Directory Listings

File descriptions are supported for long HTTP directory listings. The
default thing is to peek the title of the HTML files. 

Error and Information Message Management

A new error handling module is introduced in HTError. It uses the
error_stack  entry in the HTRequest structure.
It handles nested error messages so that we can give a reason for the
error, e.g. 

    Error in ...
        This error occurred because ...
            This is caused by ...
                etc.

It also makes it possible for the Library to pass information back to
the  client so that the the Library doesn't act like a `black hole'.
An example is HTTP redirection with status code `Moved 301'. Now the
new URL  is parsed back to the client via the error_stack so that the
client can update the reference when possible. 

The function that generates and outputs the error messages to the user
is put into HTErrorMsg Module so that it can be overwritten by a smart
client or server. 

Guessing the Content Type of a Stream

The HTGuess module reads a part a stream and determines the content
type with  the highest probability from a statistical analysis. 

Minor Stuff

tmpnam() 
   Because of problems on NeXT platforms the tmpnam() function is now 
   replaced by HTFWriter_filename() in HTFWriter.c. The function has
   two modes: Give back a hash name or the last part of the URL (which
   normally is more readable). 

HTMLPutImg() 
   New function to make it easier to put out an HTML <IMG> tag. 

HTParseInet() 
   Added one more parameter to tell whether it is a multihomed host or
   not. (This is used in the host cache).

HTInetStatus() 
   Should no more be used directly but is called from HTErrorAdd so
   that the message goes all the way back to the user 

HTError 
   This typedef is now obsolete and will be removed in future releases 

HTLoad() 
   Added new parameter to HTLoad: BOOL keep_error_stack. If YES then
   the error_stack is not cleared. This is used in redirection etc. 

HTLoadError() 
   Because of the new HTError module, this function in HTML.c is not
   needed anymore. 

Bug Fixes

This is a list of fixed bugs from earlier versions. 

* Memory faults in HTSimplify() in HTParse.c has been fixed 
* README files in directory listings now know how to handle '<', '>'
  and '&' correctly. Though the file still has to be ASCII. See future
  plans for handling this file. 
* tmpnam is no more used in the Library because of problems on NeXT
  platform. Instead a new function called HTFWriter_filename() in
  HTFWriter.c is written. 
* HTInputSocket_getCharacter now returns a int and not a char so that
  EOF is no longer a member of the char set. 
* HTMLGen_start_element() is only allowed to put extra '\n' in <PRE>
  mode if it is between parameters in a tag 
* Changed type of <IMG> into SGML_EMPTY so that it doesn't expect end
  tag <\IMG> 
* Nested <PRE> is no more a problem in HTMLGen_start_element. 
* Removed all #elif as not all compilers on HPUX likes it. 
* Changed HTChunk such that chunk->data is '\0' terminated at any
  time. This actually makes HTChunkTerminate less needed but be aware
  that HTChunk->size changes. 
* Removed non-portable d_namlen field in HTMulti. 
* Moved definition of NO_GROUPS to tch.html 
* Moved definition of HT_MAX_PATH to tch.html 
* Proxy server now closes connection in HTTP.c. This was only problem
  in non-forking servers (VMS). 
* Definition of HT_NO_DATA moved to HTUtils.html where the other
  return codes are placed. 
* Functions from HTAlert Module that prompt the user don't get
  confused about ctrl-D anymore. 

On the Working List

This is what we were are working on right now! 

MIME-parser 
   A new MIME-parser that can be used as a general module. For the
   moment there is a large number of individual MIME-parsers, and
   there is a lot of redundant coding. 

Multi-threaded HTTP Module. 
   The implementation is currently in its test phase but as the module
   has been turned completely up side down it still needs some heavy
   testing. Look here for more information on the implementation. 

Multihomed hosts 
   If a connect fails on a multihome dhost then automaticly try
   another IP-address. 

Whois++ 
   Actually a WhoIs++ module has been implemented (thanks to Michael Mealling,
   ccoprmm@oit.gatech.edu) in the library but it is not in this
   release as I haven't found many WhoIs++ servers and that the port
   chosen is 43 just like the old WhoIs protocol, and that makes it a
   bit tricky. 

Future Plans

This is what we are going to implement. If somebody should get the idea of 
writing some of the modules mentioned, it will be appreciated a lot ;-). 
Contact www-bug@info.cern.ch for further coordination. 

README File in directory listings 
   Make it possible to have both Ascii files (using <PRE>...</PRE>)
   and HTML files. 

OS/2 listings 
   Implement long FTP directory listings for OS/2 platforms. 

Multipart retrieval in HTTP 
   This will make the transmission time for documents containing
   inlined images much faster. Some implementation ideas have been
   discussed but a final design is not chosen yet. 

Ideas for New Features

This is what we have not started yet but what we would like to implement. 

Virtual Documents 
   Pass virtual documents as objects instead of HTML files. Then the
   client can choose the best way to represent the data and reorganize
   it without consulting the server again. 

Separation of Protocol Modules 
   The protocol modules should be separated completely from the HTML
   machinery so that it is possible to, e.g., get raw FTP directory
   listings through to the user. 

--
 Henrik Frystyk		| Ari Luotonen		  | Mark Donszelmann
 frystyk@dxcern.cern.ch	| luotonen@dxcern.cern.ch | duns@vxdeop.cern.ch
 + 41 22 767 8265	| + 41 22 767 8583	  | + 41 22 767 3555

-------- World-Wide Web Project, CERN, CH-1211 Geneve 23, Switzerland --------
Received on Friday, 17 June 1994 02:32:44 UTC