- From: Gerald Oskoboiny <gerald@w3.org>
- Date: Tue, 16 Aug 2016 20:25:11 -0700
- To: Dan Brickley <danbri@google.com>
- Cc: Marc Weber <marc@webhistory.org>, public-webhistory@w3.org, www-talk@w3.org, Tim Berners-Lee <timbl@w3.org>, Kevin Hughes <kev@kevcom.com>
Hi Dan, all,
* Dan Brickley <danbri@google.com> [2016-08-15 15:42+0100]
> > On Aug 15, 2016, at 06:25, Dan Brickley <danbri@google.com> wrote:
> >
> > Looking again at https://lists.w3.org/Archives/Public/www-talk/ ...
> > there are no posts archived at W3C for 1993-4. Those were interesting
> > and busy years.
> On 15 August 2016 at 15:00, Marc Weber <marc@webhistory.org> wrote:
> > Dear Dan,
> > Kevin Hughes has had the talk and www-html archives up at webhistory.org
> > since 1996; check out: http://1997.webhistory.org/www.lists/. He preserved
> > them from EIT. I’ve copied Kevin re copying to w3.org.
> 
> Thanks! The W3C archives page for www-talk already says
> "Acknowledgments: archives for 1991-1992 were generated from mboxes
> donated by Kevin Hughes and EIT." in the footer, so I thought maybe
> we'd exhausted Kevin's supply of www-talk archives.  But as you say,
> there they are... http://1997.webhistory.org/www.lists/
> 
> It would be nice to have a copy stashed on w3.org too...
We do have a copy of EIT's archives as well as ucalgary's and a
few others. Integrating them into our archives has proven
difficult because each copy has its own bugs and idiosyncracies,
and we don't want to risk breaking links in our archives by
importing incomplete data and having to add messages again later.
Here are some rough notes on various versions of these archives
that I have found. I will try to continue investigating.
----------
[w3c] primary public www-talk archives at W3C:
1991: 30 messages
1992: 463 messages
1993: missing
1994: missing
1995: 2184 messages
1995JanFeb says it has 24 messages on the index page, but only has one,
badly garbled: https://lists.w3.org/Archives/Public/www-talk/1995JanFeb/
1995MarApr says it has 624 messages but is also garbled:
https://lists.w3.org/Archives/Public/www-talk/1995MarApr/
1995MayJun is partly broken but not completely:
https://lists.w3.org/Archives/Public/www-talk/1995MayJun/0000.html
https://lists.w3.org/Archives/Public/www-talk/1995MayJun/
These have been garbled for quite some time (16+ years), so it's not
something we broke recently:
http://web.archive.org/web/20000816032107/http://lists.w3.org/Archives/Public/www-talk/1995JanFeb/
If we eventually fill in 1995 we need to be careful not to break
all the existing URIs, e.g.
https://lists.w3.org/Archives/Public/www-talk/1995MayJun/0001.html
currently has a message from 18 June which would definitely change
if we tracked down all the messages from May 1-June 18 and simply
rebuilt the archive.
----------
[eit] http://1997.webhistory.org/www.lists/
      (originally at http://www.eit.com/www.lists/ )
1991: 30 messages
1992: 472 messages
1993: 3035 messages
1994: 4657 messages
1995: 2365 messages
(note: the mbox for www-talk.1995q4 seems to be missing, but those messages
are available in the HTML version)
Why does 1992 have more messages than W3C's version? Seems due
to messages containing lines starting with 'From ':
https://lists.w3.org/Archives/Public/www-talk/1992NovDec/0069.html
cf.
http://1997.webhistory.org/www.lists/www-talk.1992/0319.html
http://1997.webhistory.org/www.lists/www-talk.1992/0320.html
(opening the mbox with mutt correctly displays 465 messages)
www-talk.1995q1 has 831 messages, Jan 1 - Apr 7
www-talk.1995q2 has 539 messages, Apr 1 - June 8
when I open 1995q2 in mutt, 91 of the messages appear to be empty due
to a header that says 'Content-Length: 0'. If I allow mutt to close
and write the mailbox, it deletes those message bodies. Deleting the
'Content-Length: 0' lines before opening it in mutt seems to work OK.
(this issue is present in other mboxes as well)
1995q2 has a number of messages with garbled headers a la:
    From www-talk@www10.w3.org  Fri Jun  2 17:54:14 1995
    Return-Path: <www-talk@www10.w3.org>
    Received: from www19 (www19.w3.org) by eitech.eit.com (4.1/SMI-4.1)
        id AA01918; Fri, 2 Jun 95 17:54:14 PDT
    Date: Fri, 2 Jun 95 17:54:14 PDT
    From: www-talk@www10.w3.org
    Message-Id: <9506030054.AA01918@eitech.eit.com>
    Apparently-To: <hypermail-feed@eit.com>
These show up in the HTMLized copy as:
http://1997.webhistory.org/www.lists/www-talk.1995q2/0519.html
http://1997.webhistory.org/www.lists/www-talk.1995q2/author.html#463
These messages appear to be correct in W3C's archives:
https://lists.w3.org/Archives/Public/www-talk/1995MayJun/0258.html
(and the underlying mbox format message is fine as well)
The issues noted above probably exist in the other mboxes as well;
1995q2 is just the one I happened to look at.
----------
[eds] Ed Summers' version, based on data from Arjun Ray and Jose Kahan
      https://lists.w3.org/Archives/Public/www-talk/2012NovDec/0010.html
1991: 30 messages
1992: 472 messages
1993: 3035 messages
1994: 4657 messages
1995: 1997 messages
These mboxes are identical to the [eit] version. (the md5 sums match)
Ed ran these through hypermail; results visible here:
http://web.archive.org/web/20131019013128/http://inkdroid.org/tmp/www-talk/
----------
[wth] www-talk-historical HTML archive at W3C (test build, not public)
1991: 30 messages
1992: 463 messages
1993: 2917 messages
1994: 2545 messages
1995: 2570 messages
at least partially garbled; has dirs named '28SepOct', '199NovDec',
'199SepOct'.
Not sure where this build came from; worth further investigation.
----------
[wth2] www-talk-historical-2 HTML archive at W3C (test build, not public)
1991: 30 messages
1992: 463 messages
1993: 2925 messages
1994: 4668 messages
1995: 1859 messages
Not sure about this one either. Looks very good overall. Seems to have
been generated based on the [eit] mboxes.
----------
[ksi] http://ksi.cpsc.ucalgary.ca/archives/WWW-TALK/
1991: 30 messages
1992: 465 messages
1993: 3082 messages
1994: 4902 messages
1995: (missing)
This version seems fairly good, for what's there.
1994q4 seems to include 1995 up until Jan 16.
----------
Of all these versions, [wth2] seems most promising.
Next steps: compare [wth2] with [ksi], see if [wth2] is missing anything
that's present in either [ksi] or [eit]; try to find out why the effort
to get somewhere with [wth2] was abandoned.
-- 
Gerald Oskoboiny     http://www.w3.org/People/Gerald/
World Wide Web Consortium (W3C)    http://www.w3.org/
tel:+1-604-906-1232             mailto:gerald@w3.org
Received on Wednesday, 17 August 2016 03:25:24 UTC