- From: Ed Summers <ehs@pobox.com>
- Date: Wed, 26 Dec 2012 22:18:40 -0500
- To: jose.kahan@w3.org
- Cc: public-webhistory@w3.org
Hi Jose, You said you were using something like hypermail, so I ran the mboxes through v2.3.0 to see how things looked. There were 10 messages that didn't get processed since they lacked a Message-ID: ed@peirce:~/Projects/pragweb$ cat www-talk_1991-1994/data/www-talk.199* | hypermail -i -d www-talk Message-ID is missing, ignoring message with subject 'World Wide Web and Viola'. Message-ID is missing, ignoring message with subject 'Re: World Wide Web and Viola'. Message-ID is missing, ignoring message with subject '(no subject)'. Message-ID is missing, ignoring message with subject 'WWW Security Recommendations'. Message-ID is missing, ignoring message with subject 'yet another tag'. Message-ID is missing, ignoring message with subject 'yet another tag'. Message-ID is missing, ignoring message with subject 'Re: Announcing Access Authorization Documentation'. Message-ID is missing, ignoring message with subject 'Re: HTML icon set was: Additions to the CGI archive'. Message-ID is missing, ignoring message with subject 'Re: minimal HTML'. Message-ID is missing, ignoring message with subject 'Eolas releases WebRouser via the Internet'. But 9,931 messages were successfully processed. I put the resulting HTML up temporarily at: http://inkdroid.org/tmp/www-talk/ You can see from the index file that 6 of the messages had suspect Date headers, since they got processed with the year 1969. But these ought to be easily fixed. So are you using hypermail at w3c? //Ed On Fri, Dec 21, 2012 at 12:46 PM, Jose Kahan <jose.kahan@w3.org> wrote: > Hi Ed, > >> Great. I'll have to check with Arjun about where he got them from. >> >> > In addition, I have some that you're missing for the first three >> > quarters of 1995, in the same mbox format. i'm sending them to >> > you in a separate message, outside of this list. >> >> I didn't realize that there were missing emails for 1995 since the >> www-talk archive page [1] displays messages for that time period. Do >> you have a sense of what date ranges are not fully represented on the >> www-talk archive page? > > I'd say 1991-1995, with 1995 being partially, but not complete. > But I don't remember right now. Maybe I did finish all of 1995. > I can find this out in january by comparing the message-id headers > for all the 1995 messages we have. > > I found some info in my old inbox stating the context. Keep in > mind that the last time I worked on this was in 2005, so I don't > remember all the exact details :) > > At some point in time, while preparing the www-talk hypertext > archive, I found out that there were other www-talk archives with > messages that we were missing (I mention that the archive in question > had 126 messages we were missing). I contact the maintainer of > that other archive and asked him for his mboxes so that I could > try to merge both archives. He then mentioned having the www-talk > mboxes for 1991-1995, as well as the historical ones for www-html > (which we have only partially put online) and some for www-vrml, if > I recall well. My source worked then at the University of Calgary, CA. > This is what he pointed out: > > [[ > I do have some archives in mbox format but they may not all be > originals. My recollection is that Tom Gruber reverse engineered the > html archives back into mbox format at some stage. I have converted > from the mbox format to databases and know that I had to fix some > glitches manually because the mbox format was ambiguous in regard to > quoted mail -- it may be that the list server had a problem, or that > Tom's conversion did. So don't put absolute trust in what I can give > you. > ]] > > As we have exactly the same archive mboxes files from 1991-1994, > they both had the same origin. Thus the above warning still stands. > > Hope this clears up the origin question. > > I'm out of the office until early January. > > Best wishes for the end-of-year holidays, > > -jose
Received on Thursday, 27 December 2012 03:19:08 UTC