W3C

A brief technical overview of Hypermess:
the hypertext mailing list archives at W3C

by José Kahan


Why Hypermess?

I thought about this name when trying to merge a messaging library called mess822 into hypermail. Although I abandonded this track when I joined the open source hypermail effort, I decided to stick with the name. It either represents the current system (Perl scripts + hypermail + smartlist) or the problems I have always have finding time to work on it.

Previous names of the system were: 1997-1998 DEISS (a breton word meaning archive), and HMPP (Hypermail Plus Plus), corresponding to different evolution stages.

In the beginning...

It all started in January 1997, when Dan Connolly got bored with maintaining the www-html mailing list and asked for a replacement. Someone pointed a finger at me and as I had never maintained a list, I thought it'd be an interesting experience... if I only had known.

One of my www-html maintainer duties is to to provide a hypertext mailing list archive. However, the system we were using at that time (Mhonarc), broke down often and it took too much time to recover from its crashes. Moreover, the archives weren't very practical, as they weren't divided into periods. (N.B., Mhonarc has evolved since that time and become more robust and offers many new features. I'm only describing the state of things in January 1997).

So, I started building a more resistent and easier to maintain system (zero maintainance). It also gave me an excuse to learn Perl. This rain project came out nicely and my W3C teammates asked me if I could install this system on their lists too. As this wasn't my initial duty, I learn the meaning of the term "juggling with one's time."

Since then, I've been maintaining the archiving system and improving it, whenever I find free time to do so. See the history section for more info.

System architecture

Hardware

Our mailing list server runs on a PIII 800 Mhz, 1.5G RAMS and lots of hard disk space, both SCSI and IDE (circa 96 GB), and an Intel 8257 EtherExpress Pro Fast Ethernet interface running at 100 Mb/s.

Operation system

Solaris 2.8 for Intel (Solaris 8 10/00 s28x_u2wos_11 INTEL).

Software

I have three components:

The following figure shows the interaction between these different components:

Interaction between the differe

The update-period script is needed because hypermail doesn't know how to handle MH mboxes. This script converts a given set of an MH box into an mbox. In addition, to be able to detect without unambiguity the beginning of each new message, we escape all lines except for the From envelope with a ">" char, into an ietf mbox type. Hypermail has an option to deal with this kind of mboxes. I know it, as I'm the one who added it :-)

Experience

I think it has been a good idea to have a front-end script between the mailing list server and the mail-to-HTML conversion program. I was able to add fault tolerance and archive division features without having to make intensive hack in the end-point systems. Finally, if we change any of the end systems, it should allow me to setup the same kind of interface and behavior.

The most delicate problem we have right now is to ensure the persistence of links. Hypermail converts each message into a file and people make links to these files. If my hypertext archives aren't rebuilt the same way each time, I'll break those links. Because of this, upgrades are always delicate, dangerous, and stressing. The best solution would be to move to a system where messages are referenced by their msgid and list-name, rather than by the filename that hypermail makes. This is the direction that we'll try to follow for the next version of the system.
Update: we tried that direction and the code is actually inside the public hypermail and the code is actually inside the public hypermail. However people felt very uncomfortable with the 16 char URIs (complex to quote on the phone or mail) and we came back to the status quo. Instead we added an msgid based mail search script that solves that problem. All messaegs sent tou the list server include the archive URI quoted that way.

Working and contributing to the open source hypermail system is a very big advantage: more people test it, detect and fix bugs, and add new interesting features than what I could do alone. Instead of adding MIME support to the patched hypermail I had, I used my time to test the open source hypermail and enhance it. One thing I have yet to find out is what will happen when the public hypermail includes new features that affect the performance or backward compatibility... to be continued.
Update: the follow up of contributions has worked nicely. However, it means that before installing a new version of hypermail, we have to test it thoroughly. This is not so bad because it helps make hypermail more robust and we can contribute our patches whenever we want. It's really useful to be in the developer's team.

Availability

Hypermail is available off the hypermail development center.

My development version of hypermail as well as my Perl front-end scripts are available off W3C's public CVS server. The scripts are not yet ready for a public distribution and are quite ad-hoc to our server installation.. Use them at your own risk. My current plan is to package these scripts and contribute them to the public hypermail base.
Update: this has not happened because of lack of time. If someone's interested in them, mail me.

As of December 1999, all my patches and changes have been committed to the public hypermail version.

History

Summer 2003: WAI enhancements added to hypermail and starting to be deployed on all the W3C mailing list archives.

3 January 2000: I find the time to complete the system description page, that was started in June 1997.

1 January 2000: Everything is still working :-)

22-24 December 1999: Upgrade of all the mailing list archives to the new hypermess (around 330 lists, over 6 hours processing time). The archive has a new interface, contains a search form, and uses (simple) CSS.

November 1999: Live test of the MIME-enabled archives on three W3C mailing lists.

September 1999: Kent Landfield and Daniel Stenberg encourage me to merge all my patches into the current development tree. The idea is to minimize the number of mutant hypermail versions and to pool our efforts. I get CVS write access to the hypermail base and do it. I manage to organize myself so that I'm able to work one week per month on hypermail and hypermess. Start worrying about making the conversion before the end of the year. Hypermail 1.x has Y2K problems.

August 1999: I start submitting bug fixes, patches to the hypermail mailing list.

Early Summer 1999: Find out there's an open-source hypermail development group. Their version already handles MIME. I review their code during two weeks and find it's pretty decent. I decided to scratch my hypermail version and use my time for testing the public version and port my principal patches to it..

Early 1999: Request for adding MIME-support to hypermail (1.x). Re-evaluation of Mhonarc, but still manage to crash it or it's too slow. I start to patch hypermail and go up to combining it with the mess822 library. Named the system Hypermess, to symbolize this merger and also to describe how I'm finding time to work on it..

Rest of 1997-1998: I continue patching hypermail as needed or add new features to it whenever I find the time to do so-- which isn't very often. I'm frustrated I can't find more time to improve the system.

June 1997: I write a page to describe the system. The page only has a paragraph saying that I'll add more info when I find more time. Mailing list maintainers can now customize the main index of their archives.

April 1997: Deployment of the new archiving system on most of W3C's mailing lists. Met Kevin Hughes (hypermail's author) at WWW6, in Santa Clara and proposed him my patches. Kevin tells me that hypermail is no longer maintained.

February-March 1997: Development of a Perl script/hypermail (1.x) system that allows to auotmatically divide an archive into periods (thus making it easier to browse and less time consuming to archive new messages), and with some safe-guards so that it can recover automatically from crashes and quickly rebuild an archive from scratch. Found and fixed many hypermail bugs (mainly sigsev ones).

January 1997: I become maintainer of the www-html mailing list. One of my tasks is to provide a hypertext archive for the mailing list. At that time, we were using Mhonarc. At that time, Mhonarc used to crashed often and recovering was expensive. Another problem is that the archive isn't divided into periods. The more messages there are in the archive, the more time it takes to add new ones or to browse them. I need something better that will be easier to maintain (zero maintainance) and easier to consult..

Acknowledgments

Thanks to Daniel Stenberg and Kent Landfield for encouraging me to merge my propietary hypermail changes into the hypermail open source project and for inviting me to join the hypermail developer's group.

The following people have contributed either through feedback or beta-testing (hope everyone is cited):


Jose Kahan
Webmaster
Last update on: Thu, Sep 11, 2003