W3C home > Mailing lists > Public > public-webhistory@w3.org > December 2012

Re: Re: Re: www-talk archives: 1992-1995?

From: Ed Summers <ehs@pobox.com>
Date: Wed, 26 Dec 2012 22:18:40 -0500
Message-ID: <CABzDd=7pQ4-VhbHKN-4PDschHuwvEoAu9k+bY0F2dyr4-vx2+w@mail.gmail.com>
To: jose.kahan@w3.org
Cc: public-webhistory@w3.org
Hi Jose,

You said you were using something like hypermail, so I ran the mboxes
through v2.3.0 to see how things looked. There were 10 messages that
didn't get processed since they lacked a Message-ID:

ed@peirce:~/Projects/pragweb$ cat
www-talk_1991-1994/data/www-talk.199* | hypermail -i -d www-talk
Message-ID is missing, ignoring message with subject 'World Wide Web and Viola'.
Message-ID is missing, ignoring message with subject 'Re:  World Wide
Web and Viola'.
Message-ID is missing, ignoring message with subject '(no subject)'.
Message-ID is missing, ignoring message with subject 'WWW Security
Recommendations'.
Message-ID is missing, ignoring message with subject 'yet another tag'.
Message-ID is missing, ignoring message with subject 'yet another tag'.
Message-ID is missing, ignoring message with subject 'Re: Announcing
Access Authorization Documentation'.
Message-ID is missing, ignoring message with subject 'Re: HTML icon
set was: Additions to the CGI archive'.
Message-ID is missing, ignoring message with subject 'Re: minimal HTML'.
Message-ID is missing, ignoring message with subject 'Eolas releases
WebRouser via the Internet'.

But 9,931 messages were successfully processed. I put the resulting
HTML up temporarily at:

    http://inkdroid.org/tmp/www-talk/

You can see from the index file that 6 of the messages had suspect
Date headers, since they got processed with the year 1969. But these
ought to be easily fixed.

So are you using hypermail at w3c?

//Ed

On Fri, Dec 21, 2012 at 12:46 PM, Jose Kahan <jose.kahan@w3.org> wrote:
> Hi Ed,
>
>> Great. I'll have to check with Arjun about where he got them from.
>>
>> > In addition, I have some that you're missing for the first three
>> > quarters of 1995, in the same mbox format. i'm sending them to
>> > you in a separate message, outside of this list.
>>
>> I didn't realize that there were missing emails for 1995 since the
>> www-talk archive page [1] displays messages for that time period. Do
>> you have a sense of what date ranges are not fully represented on the
>> www-talk archive page?
>
> I'd say 1991-1995, with 1995 being partially, but not complete.
> But I don't remember right now. Maybe I did finish all of 1995.
> I can find this out in january by comparing the message-id headers
> for all the 1995 messages we have.
>
> I found some info in my old inbox stating the context. Keep in
> mind that the last time I worked on this was in 2005, so I don't
> remember all the exact details :)
>
> At some point in time, while preparing the www-talk hypertext
> archive, I found out that there were other www-talk archives with
> messages that we were missing (I mention that the archive in question
> had 126 messages we were missing). I contact the maintainer of
> that other archive and asked him for his mboxes so that I could
> try to merge both archives. He then mentioned having the www-talk
> mboxes for 1991-1995, as well as the historical ones for www-html
> (which we have only partially put online) and some for www-vrml, if
> I recall well. My source worked then at the University of Calgary, CA.
> This is what he pointed out:
>
> [[
> I do have some archives in mbox format but they may not all be
> originals. My recollection is that Tom Gruber reverse engineered the
> html archives back into mbox format at some stage. I have converted
> from the mbox format to databases and know that I had to fix some
> glitches manually because the mbox format was ambiguous in regard to
> quoted mail -- it may be that the list server had a problem, or that
> Tom's conversion did. So don't put absolute trust in what I can give
> you.
> ]]
>
> As we have exactly the same archive mboxes files from 1991-1994,
> they both had the same origin. Thus the above warning still stands.
>
> Hope this clears up the origin question.
>
> I'm out of the office until early January.
>
> Best wishes for the end-of-year holidays,
>
> -jose
Received on Thursday, 27 December 2012 03:19:08 GMT

This archive was generated by hypermail 2.2.0+W3C-0.50 : Thursday, 27 December 2012 03:19:09 GMT