Re: w3.org archival of #rdfig logs and chump from Edd Dumbill on 2003-01-03 (www-archive@w3.org from January 2003)

From: Edd Dumbill <edd@usefulinc.com>
Date: 03 Jan 2003 20:35:31 +0000
To: Gerald Oskoboiny <gerald@w3.org>
Cc: Dan Brickley <danbri@w3.org>, Dave Beckett <dave.beckett@bristol.ac.uk>, connolly@w3.org, www-archive@w3.org
Message-Id: <1041626130.13390.16.camel@pingu.bunker.heddley.com>

I'm fine with a crawler, you're right about the bandwidth requirements
-- especially as each day there are at most 4 files which change in the
chump hierarchy (front page, archived day, month, year.)

A simple Perl script could mirror the chump stuff quite happily.

-- Edd

On Fri, 2003-01-03 at 20:32, Gerald Oskoboiny wrote:
> * Dan Brickley <danbri@w3.org> [2003-01-03 15:20-0500]
> > 
> > Hi Edd, Dave,
> > 
> > We're looking into making sure #rdfig discussions have an archive at 
> > W3C. Probably the easiest thing to do is to agree a schedule for us to 
> > grab a tar.gz from each of your sites. In theory, the chump is completely 
> > derrivable; in practice, it'd be a pain to rebuild the site that way. 
> 
> Hi, pardon my butting in...
> 
> I just wanted to note that grabbing regular .tar.gz's would use
> much more bandwidth than some other more incremental approach.
> It would be much more efficient (but slightly more work) to
> http-crawl each site and just keep a copy of everything we see,
> if that's ok with you.
> 
> (I don't know how much more work it is; I think it's fairly easy
> to do stuff like this with wget, though I always need to rtfm)
-- 
Edd Dumbill ----| phone: +44 1904 427740 fax: +44 8709 909625 |-----
 | Managing Editor, XML.com, XMLhack.com  --  Chair, XML Europe 2003
 | I PGP sign my email; more info at http://heddley.com/edd/pgp.html

Received on Friday, 3 January 2003 15:38:14 UTC