CERN Common World-Wide Web Library Version 2.16pre1 Available
* * * * *
The CERN Common WWW Library is a general code base that can be used
to build clients and servers. It contains code for accessing HTTP, FTP,
Gopher, News, WAIS, Telnet servers, and the local file system.
Furthermore it provides modules for parsing, managing and presenting
hypertext objects to the user and a wide spectra of generic programming
* * * * *
CERN Common Code Library 2.16pre1 is available, source code:
Its is known to compile on Sun4, Solaris, HP, NeXT, NeXT-386,
Decstation Ultrix and DEC OSF/1.
Diffs and old versions are available at
Documentation is available at
Programmer's Guide is available at
The current address to send email about CERN Library is:
- - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - -
CERN Library 2.16 Prerelease Notes
New Features and Changed Interfaces
HTTP module contains the code for the HTTP client. The module is now
reorganized and made more modular.
Now supported by the HTTP Module. The name of the new URL is parsed
to the client via the error_stack as a ERR_INFO message, see
HTError module. The maximum number of redirections is set by the
Referer Field in HTTP request
Clients are provided the possibility of sending a Referer Field in
a HTTP Request. This is done by filling out the HTRequest->parentAnchor
>From field in HTTP Request
Clients can now send the full email address of the current user in
the HTTP From field. The feature is turned off by default as it
might get a bit tricky through a Proxy.
Support of return code `204 No Response'
HTFTP module contains the code for the FTP client. The FTP client has
changed a lot in this release. It is now a complete state machine
where the actual action executed is a function of the current state.
The client now follows the suggestions given in rfc 1123: "Requirements
for Internet Hosts -- Application and Support".
Establishment of the data connection now comply to rfc 1579:
"Firewall-friendly FTP" such that the procedure is
1. try PASV
2. if that fails, try PORT
The URL is now parsed according to the (latest) specifications:
url : f t p: / / login / path [ ftptype ]
login : [ user [ : password ] @ ] hostport
hostport: host [ : port ]
ftptype : A formcode | E formcode | I | L digits
formcode: N | T | C
Both directory listings and file retrieval use the same procedure:
1. First try to go to the location directly, as we are often
talking to a UNIX server or one that 'understands' UNIX syntax
2. If it fails, then go to the location step by step using CWD. In
that way we should not have any problems on any platform, and
thus it is not necessary to make special hacks for VMS, etc.
Long directory listings are supported for unix-like systems and VMS.
This includes NetWare and WindowsNT. See Future plans for more and
Information from the FTP-server is pr default presented to the client
using the following rule:
1. If you are connecting to the root directory at a ftp site, we
show the 'login' message (might be a concatenation of several
messages) just like in a normal ftp session.
2. If you have a more specific URL, then you probably already know
the site and are less interested in the login message. Instead
we show any local message when making a CWD to the right location.
The Gopher has been revised and improved error handling has been implemented.
Some Gopher servers send back information messages in a line containing
"error.host". This information is treated like login information
from FTP servers so that it is represented as a message before or after
the actual listing.
Listings now contain icons in the same way as the other listings.
CSO Name Server
The CSO Name Server client outputs in HTML and not only <PRE> as before.
Content Type Recognition
The Gopher module uses it's own content-type recognition inherited
from HTTP when handling gopher text and gopher binary files. This
means, that e.g. PostScript files get handled correctly.
Local File Access
The new version of HTFile module is a lot smaller as all Directory
listing stuff has moved to HTDirBrw module. New error handling has
Passive and Active Connection Establishment
Calls to connect() and accept() now go through the functions
HTDoConnect() and HTDoAccept() respectively.
Cache of Host Names and Addresses
HTInetParse() that is called from within HTDoConnect now has an
internal cache of the names and (possible multiple) IP-addresses of
visited hosts. This minimizes the access to the file /etc/hosts and
the Domain Name Server, even though aliases are not recognized in the cache.
The default cache size is 500 entries and a host stays as long as a
connect() succeeds. That is, if connection is refused for some
reason, the host is taken out of the cache.
The time to make a connection to a multihomed host is measured every
time and a mean access time is calculated so that HTDoConnect always
takes the fastest IP-address, see Future plans.
Improved Functionality of DNS requests
The Library now provides functionality for obtaining the full mail
address of the user, full domain name of the host and also the
possibility for setting both values. This means that the user can use
his official email address, e.g. in the HTTP request.
Long Directory Listings
Long directory listings for HTTP, FTP and files on the local file
system supported. For the moment only a part of the functionality,
e.g, sorting, which columns to show etc. is exploited, see Future Plans.
Icons in directory listings are bound to MIME content-types and
encoding. They can be found in the HTIcons module. The default set of
icons is set up using HTStdIconInit() and new icons can be added
dynamicly using HTAddIcon().
File Descriptions in Directory Listings
File descriptions are supported for long HTTP directory listings. The
default thing is to peek the title of the HTML files.
Error and Information Message Management
A new error handling module is introduced in HTError. It uses the
error_stack entry in the HTRequest structure.
It handles nested error messages so that we can give a reason for the
Error in ...
This error occurred because ...
This is caused by ...
It also makes it possible for the Library to pass information back to
the client so that the the Library doesn't act like a `black hole'.
An example is HTTP redirection with status code `Moved 301'. Now the
new URL is parsed back to the client via the error_stack so that the
client can update the reference when possible.
The function that generates and outputs the error messages to the user
is put into HTErrorMsg Module so that it can be overwritten by a smart
client or server.
Guessing the Content Type of a Stream
The HTGuess module reads a part a stream and determines the content
type with the highest probability from a statistical analysis.
Because of problems on NeXT platforms the tmpnam() function is now
replaced by HTFWriter_filename() in HTFWriter.c. The function has
two modes: Give back a hash name or the last part of the URL (which
normally is more readable).
New function to make it easier to put out an HTML <IMG> tag.
Added one more parameter to tell whether it is a multihomed host or
not. (This is used in the host cache).
Should no more be used directly but is called from HTErrorAdd so
that the message goes all the way back to the user
This typedef is now obsolete and will be removed in future releases
Added new parameter to HTLoad: BOOL keep_error_stack. If YES then
the error_stack is not cleared. This is used in redirection etc.
Because of the new HTError module, this function in HTML.c is not
This is a list of fixed bugs from earlier versions.
* Memory faults in HTSimplify() in HTParse.c has been fixed
* README files in directory listings now know how to handle '<', '>'
and '&' correctly. Though the file still has to be ASCII. See future
plans for handling this file.
* tmpnam is no more used in the Library because of problems on NeXT
platform. Instead a new function called HTFWriter_filename() in
HTFWriter.c is written.
* HTInputSocket_getCharacter now returns a int and not a char so that
EOF is no longer a member of the char set.
* HTMLGen_start_element() is only allowed to put extra '\n' in <PRE>
mode if it is between parameters in a tag
* Changed type of <IMG> into SGML_EMPTY so that it doesn't expect end
* Nested <PRE> is no more a problem in HTMLGen_start_element.
* Removed all #elif as not all compilers on HPUX likes it.
* Changed HTChunk such that chunk->data is '\0' terminated at any
time. This actually makes HTChunkTerminate less needed but be aware
that HTChunk->size changes.
* Removed non-portable d_namlen field in HTMulti.
* Moved definition of NO_GROUPS to tch.html
* Moved definition of HT_MAX_PATH to tch.html
* Proxy server now closes connection in HTTP.c. This was only problem
in non-forking servers (VMS).
* Definition of HT_NO_DATA moved to HTUtils.html where the other
return codes are placed.
* Functions from HTAlert Module that prompt the user don't get
confused about ctrl-D anymore.
On the Working List
This is what we were are working on right now!
A new MIME-parser that can be used as a general module. For the
moment there is a large number of individual MIME-parsers, and
there is a lot of redundant coding.
Multi-threaded HTTP Module.
The implementation is currently in its test phase but as the module
has been turned completely up side down it still needs some heavy
testing. Look here for more information on the implementation.
If a connect fails on a multihome dhost then automaticly try
Actually a WhoIs++ module has been implemented (thanks to Michael Mealling,
firstname.lastname@example.org) in the library but it is not in this
release as I haven't found many WhoIs++ servers and that the port
chosen is 43 just like the old WhoIs protocol, and that makes it a
This is what we are going to implement. If somebody should get the idea of
writing some of the modules mentioned, it will be appreciated a lot ;-).
Contact email@example.com for further coordination.
README File in directory listings
Make it possible to have both Ascii files (using <PRE>...</PRE>)
and HTML files.
Implement long FTP directory listings for OS/2 platforms.
Multipart retrieval in HTTP
This will make the transmission time for documents containing
inlined images much faster. Some implementation ideas have been
discussed but a final design is not chosen yet.
Ideas for New Features
This is what we have not started yet but what we would like to implement.
Pass virtual documents as objects instead of HTML files. Then the
client can choose the best way to represent the data and reorganize
it without consulting the server again.
Separation of Protocol Modules
The protocol modules should be separated completely from the HTML
machinery so that it is possible to, e.g., get raw FTP directory
listings through to the user.
Henrik Frystyk | Ari Luotonen | Mark Donszelmann
firstname.lastname@example.org | email@example.com | firstname.lastname@example.org
+ 41 22 767 8265 | + 41 22 767 8583 | + 41 22 767 3555
-------- World-Wide Web Project, CERN, CH-1211 Geneve 23, Switzerland --------