Re: Re: Re: www-talk archives: 1992-1995?

Hi Jose,

You said you were using something like hypermail, so I ran the mboxes
through v2.3.0 to see how things looked. There were 10 messages that
didn't get processed since they lacked a Message-ID:

ed@peirce:~/Projects/pragweb$ cat
www-talk_1991-1994/data/www-talk.199* | hypermail -i -d www-talk
Message-ID is missing, ignoring message with subject 'World Wide Web and Viola'.
Message-ID is missing, ignoring message with subject 'Re:  World Wide
Web and Viola'.
Message-ID is missing, ignoring message with subject '(no subject)'.
Message-ID is missing, ignoring message with subject 'WWW Security
Recommendations'.
Message-ID is missing, ignoring message with subject 'yet another tag'.
Message-ID is missing, ignoring message with subject 'yet another tag'.
Message-ID is missing, ignoring message with subject 'Re: Announcing
Access Authorization Documentation'.
Message-ID is missing, ignoring message with subject 'Re: HTML icon
set was: Additions to the CGI archive'.
Message-ID is missing, ignoring message with subject 'Re: minimal HTML'.
Message-ID is missing, ignoring message with subject 'Eolas releases
WebRouser via the Internet'.

But 9,931 messages were successfully processed. I put the resulting
HTML up temporarily at:

    http://inkdroid.org/tmp/www-talk/

You can see from the index file that 6 of the messages had suspect
Date headers, since they got processed with the year 1969. But these
ought to be easily fixed.

So are you using hypermail at w3c?

//Ed

On Fri, Dec 21, 2012 at 12:46 PM, Jose Kahan <jose.kahan@w3.org> wrote:
> Hi Ed,
>
>> Great. I'll have to check with Arjun about where he got them from.
>>
>> > In addition, I have some that you're missing for the first three
>> > quarters of 1995, in the same mbox format. i'm sending them to
>> > you in a separate message, outside of this list.
>>
>> I didn't realize that there were missing emails for 1995 since the
>> www-talk archive page [1] displays messages for that time period. Do
>> you have a sense of what date ranges are not fully represented on the
>> www-talk archive page?
>
> I'd say 1991-1995, with 1995 being partially, but not complete.
> But I don't remember right now. Maybe I did finish all of 1995.
> I can find this out in january by comparing the message-id headers
> for all the 1995 messages we have.
>
> I found some info in my old inbox stating the context. Keep in
> mind that the last time I worked on this was in 2005, so I don't
> remember all the exact details :)
>
> At some point in time, while preparing the www-talk hypertext
> archive, I found out that there were other www-talk archives with
> messages that we were missing (I mention that the archive in question
> had 126 messages we were missing). I contact the maintainer of
> that other archive and asked him for his mboxes so that I could
> try to merge both archives. He then mentioned having the www-talk
> mboxes for 1991-1995, as well as the historical ones for www-html
> (which we have only partially put online) and some for www-vrml, if
> I recall well. My source worked then at the University of Calgary, CA.
> This is what he pointed out:
>
> [[
> I do have some archives in mbox format but they may not all be
> originals. My recollection is that Tom Gruber reverse engineered the
> html archives back into mbox format at some stage. I have converted
> from the mbox format to databases and know that I had to fix some
> glitches manually because the mbox format was ambiguous in regard to
> quoted mail -- it may be that the list server had a problem, or that
> Tom's conversion did. So don't put absolute trust in what I can give
> you.
> ]]
>
> As we have exactly the same archive mboxes files from 1991-1994,
> they both had the same origin. Thus the above warning still stands.
>
> Hope this clears up the origin question.
>
> I'm out of the office until early January.
>
> Best wishes for the end-of-year holidays,
>
> -jose

Received on Thursday, 27 December 2012 03:19:08 UTC