- From: Benjamin Franz <snowhare@netimages.com>
- Date: Tue, 26 Dec 1995 15:49:47 -0800 (PST)
- To: www-talk@w3.org
On Sun, 24 Dec 1995, Daniel W. Connolly wrote: > Hypermail is great. Mhonarc is even better. But I've got a lot of > ideas for improvements: > > Requirements: > > 0. Support MIME ala mhonarc. Hmmm....Let me think about it. > > 1. Base the published URLs on the global message-ids, not on local > sequence numbers. So in stead of: > > http://www.foo.com/archive/mlist/00345.html > > I want to see: > > http://www.foo.com/archive/mlist?message-id=234234223@bar.net I already support this - it just isn't the main interface right now: <URL:http://www.netimages.com/ni-cgi-bin/fetch?newsgroup=alt.devilbunnies&messageid=4bo3ol$3s2@xmission.xmission.com> I have to fundamentally change the way I index the files to improve the speed of message-id searches. Right now I use some heuristics based on locality and the idea that people are usually looking for _recent_ messages to make a usually fast search of a flat database. [...] > 2. Support format negotiation. Make the original message/rfc822 data > available as well as the enhanced-with-links html format -- at the > same address. This _should_ allow clients to treat the message as a > message, i.e. reply to it, etc. by specifying: > > Accept: message/rfc822 Pretty easy to do. We just need agreement by browser authors to request it. Actually saves me processing time since I don't have to mark up the message, just decompress and throw it. > 3. Keep the index pages to a reasonable size. Don't list 40000 > messages by default. The cover page should show the last 50 or so > messages, plus a query form where folks can select articles... I already chop it into small chunks. alt.devilbunnies volume overwhelmed people pretty early. I break it down to the day plus a search button. What is really needed is flexible specification of the display. > 4. Allow relational queries: by date, author, subject, message-id, > keywords, or any combination. Essentially, treat the archive as a > relational database table with fields message-id, from, date, subject, > keywords, and body. Got Subject and From already with perl regex matching and AND/OR and month level date restriction. The rest will be part of my full-body text search rewrite. RSN, I hope. > In fact, consider this table to consist of all the mail messages > and news articles ever posted (past, present, and future). Any > given archive has partial knowledge of the table. Let's call > this global service the message-archive service. So rather than: > > http://www.foo.com/archive/www-html?message-id=234234223@bar.net > > I want to see: > > http://www.foo.com/message-archive?to=www-html@w3.org;message-id=234234223@bar.net Hmmm...That is pretty much what I do now. I am going have to change my usage of '&' for seperators though since SGML parsers choke on it in. > Goals: > > 5. Generate HTML on the fly, not in batch. Cache the most recent pages > of course (in memory?), but don't waste all that disk space. Already do that. Decided it was too restrictive in the upgrade path to batch it. Worse, rebuilding even the indexes on a nightly basis was loading the machine down badly. > (support if-modified-since in the on-the-fly generator, by the way) RSN. > Update the index in real-time, as messages arrive, not in batch. Hmmm...requires adding better file locking and a spool watching program, but not really a problem otherwise since I already do the updates incrementally with a cronjob as often as you want. OTOH: Is there any real reason to force actual real time updating? Archives are not meant to replace newsreaders. > 6. Allow batch query results. Offer to return the raw message/rfc822 > data (optionally compressed) for, e.g. "all messages from july 7 to > dec 1 with fred in the from field". Hmmm.. > 7. Export a harvest gatherer interface, so that collections of mail > archives can be combined into harvest broker search services where > folks can do similar relational and full-text queries. :( I haven't had much luck with Harvest combined with Linux. > 8. Allow annotations (using PICS ratings???) for "yeah, that > was a really good post!" or "hey: if you liked that, you > should take a look at ..." Hmmm... > 9. Make it a long-running process exporting an ILU interface, rather > than a fork-per-invocation CGI script. Provide a CGI-to-ILU hack for > interoperability with pre-ILU web servers. What he said. :) ILU? You left out CD-ROM support. One of the other admins around here is always pestering me to make my software more CD-ROM friendly by seperating the index tree from the article storage tree so he can move the articles off to CD-ROM. -- Benjamin Franz, Usenet-Web author <URL:http://www.netimages.com/~snowhare/utilities/usenet-web/>
Received on Tuesday, 26 December 1995 18:38:36 UTC