- From: Gerald Oskoboiny <gerald@w3.org>
- Date: Tue, 16 Aug 2016 20:25:11 -0700
- To: Dan Brickley <danbri@google.com>
- Cc: Marc Weber <marc@webhistory.org>, public-webhistory@w3.org, www-talk@w3.org, Tim Berners-Lee <timbl@w3.org>, Kevin Hughes <kev@kevcom.com>
Hi Dan, all, * Dan Brickley <danbri@google.com> [2016-08-15 15:42+0100] > > On Aug 15, 2016, at 06:25, Dan Brickley <danbri@google.com> wrote: > > > > Looking again at https://lists.w3.org/Archives/Public/www-talk/ ... > > there are no posts archived at W3C for 1993-4. Those were interesting > > and busy years. > On 15 August 2016 at 15:00, Marc Weber <marc@webhistory.org> wrote: > > Dear Dan, > > Kevin Hughes has had the talk and www-html archives up at webhistory.org > > since 1996; check out: http://1997.webhistory.org/www.lists/. He preserved > > them from EIT. I’ve copied Kevin re copying to w3.org. > > Thanks! The W3C archives page for www-talk already says > "Acknowledgments: archives for 1991-1992 were generated from mboxes > donated by Kevin Hughes and EIT." in the footer, so I thought maybe > we'd exhausted Kevin's supply of www-talk archives. But as you say, > there they are... http://1997.webhistory.org/www.lists/ > > It would be nice to have a copy stashed on w3.org too... We do have a copy of EIT's archives as well as ucalgary's and a few others. Integrating them into our archives has proven difficult because each copy has its own bugs and idiosyncracies, and we don't want to risk breaking links in our archives by importing incomplete data and having to add messages again later. Here are some rough notes on various versions of these archives that I have found. I will try to continue investigating. ---------- [w3c] primary public www-talk archives at W3C: 1991: 30 messages 1992: 463 messages 1993: missing 1994: missing 1995: 2184 messages 1995JanFeb says it has 24 messages on the index page, but only has one, badly garbled: https://lists.w3.org/Archives/Public/www-talk/1995JanFeb/ 1995MarApr says it has 624 messages but is also garbled: https://lists.w3.org/Archives/Public/www-talk/1995MarApr/ 1995MayJun is partly broken but not completely: https://lists.w3.org/Archives/Public/www-talk/1995MayJun/0000.html https://lists.w3.org/Archives/Public/www-talk/1995MayJun/ These have been garbled for quite some time (16+ years), so it's not something we broke recently: http://web.archive.org/web/20000816032107/http://lists.w3.org/Archives/Public/www-talk/1995JanFeb/ If we eventually fill in 1995 we need to be careful not to break all the existing URIs, e.g. https://lists.w3.org/Archives/Public/www-talk/1995MayJun/0001.html currently has a message from 18 June which would definitely change if we tracked down all the messages from May 1-June 18 and simply rebuilt the archive. ---------- [eit] http://1997.webhistory.org/www.lists/ (originally at http://www.eit.com/www.lists/ ) 1991: 30 messages 1992: 472 messages 1993: 3035 messages 1994: 4657 messages 1995: 2365 messages (note: the mbox for www-talk.1995q4 seems to be missing, but those messages are available in the HTML version) Why does 1992 have more messages than W3C's version? Seems due to messages containing lines starting with 'From ': https://lists.w3.org/Archives/Public/www-talk/1992NovDec/0069.html cf. http://1997.webhistory.org/www.lists/www-talk.1992/0319.html http://1997.webhistory.org/www.lists/www-talk.1992/0320.html (opening the mbox with mutt correctly displays 465 messages) www-talk.1995q1 has 831 messages, Jan 1 - Apr 7 www-talk.1995q2 has 539 messages, Apr 1 - June 8 when I open 1995q2 in mutt, 91 of the messages appear to be empty due to a header that says 'Content-Length: 0'. If I allow mutt to close and write the mailbox, it deletes those message bodies. Deleting the 'Content-Length: 0' lines before opening it in mutt seems to work OK. (this issue is present in other mboxes as well) 1995q2 has a number of messages with garbled headers a la: From www-talk@www10.w3.org Fri Jun 2 17:54:14 1995 Return-Path: <www-talk@www10.w3.org> Received: from www19 (www19.w3.org) by eitech.eit.com (4.1/SMI-4.1) id AA01918; Fri, 2 Jun 95 17:54:14 PDT Date: Fri, 2 Jun 95 17:54:14 PDT From: www-talk@www10.w3.org Message-Id: <9506030054.AA01918@eitech.eit.com> Apparently-To: <hypermail-feed@eit.com> These show up in the HTMLized copy as: http://1997.webhistory.org/www.lists/www-talk.1995q2/0519.html http://1997.webhistory.org/www.lists/www-talk.1995q2/author.html#463 These messages appear to be correct in W3C's archives: https://lists.w3.org/Archives/Public/www-talk/1995MayJun/0258.html (and the underlying mbox format message is fine as well) The issues noted above probably exist in the other mboxes as well; 1995q2 is just the one I happened to look at. ---------- [eds] Ed Summers' version, based on data from Arjun Ray and Jose Kahan https://lists.w3.org/Archives/Public/www-talk/2012NovDec/0010.html 1991: 30 messages 1992: 472 messages 1993: 3035 messages 1994: 4657 messages 1995: 1997 messages These mboxes are identical to the [eit] version. (the md5 sums match) Ed ran these through hypermail; results visible here: http://web.archive.org/web/20131019013128/http://inkdroid.org/tmp/www-talk/ ---------- [wth] www-talk-historical HTML archive at W3C (test build, not public) 1991: 30 messages 1992: 463 messages 1993: 2917 messages 1994: 2545 messages 1995: 2570 messages at least partially garbled; has dirs named '28SepOct', '199NovDec', '199SepOct'. Not sure where this build came from; worth further investigation. ---------- [wth2] www-talk-historical-2 HTML archive at W3C (test build, not public) 1991: 30 messages 1992: 463 messages 1993: 2925 messages 1994: 4668 messages 1995: 1859 messages Not sure about this one either. Looks very good overall. Seems to have been generated based on the [eit] mboxes. ---------- [ksi] http://ksi.cpsc.ucalgary.ca/archives/WWW-TALK/ 1991: 30 messages 1992: 465 messages 1993: 3082 messages 1994: 4902 messages 1995: (missing) This version seems fairly good, for what's there. 1994q4 seems to include 1995 up until Jan 16. ---------- Of all these versions, [wth2] seems most promising. Next steps: compare [wth2] with [ksi], see if [wth2] is missing anything that's present in either [ksi] or [eit]; try to find out why the effort to get somewhere with [wth2] was abandoned. -- Gerald Oskoboiny http://www.w3.org/People/Gerald/ World Wide Web Consortium (W3C) http://www.w3.org/ tel:+1-604-906-1232 mailto:gerald@w3.org
Received on Wednesday, 17 August 2016 03:25:21 UTC