W3C home > Mailing lists > Public > www-archive@w3.org > January 2003

Re: w3.org archival of #rdfig logs and chump

From: Dan Brickley <danbri@w3.org>
Date: Fri, 3 Jan 2003 18:18:18 -0500
To: Edd Dumbill <edd@usefulinc.com>
Cc: Gerald Oskoboiny <gerald@w3.org>, Dave Beckett <dave.beckett@bristol.ac.uk>, connolly@w3.org, www-archive@w3.org
Message-ID: <20030103231818.GA21780@tux.w3.org>

* Edd Dumbill <edd@usefulinc.com> [2003-01-03 20:35+0000]
> I'm fine with a crawler, you're right about the bandwidth requirements
> -- especially as each day there are at most 4 files which change in the
> chump hierarchy (front page, archived day, month, year.)
> 
> A simple Perl script could mirror the chump stuff quite happily.

Yep, Gerald's right. I guess I figured the laziest, simplest thing 
was just grabbing the single tar.gz, but probably best to get the 4 pages 
instead.

Dan 


> 
> -- Edd
> 
> On Fri, 2003-01-03 at 20:32, Gerald Oskoboiny wrote:
> > * Dan Brickley <danbri@w3.org> [2003-01-03 15:20-0500]
> > > 
> > > Hi Edd, Dave,
> > > 
> > > We're looking into making sure #rdfig discussions have an archive at 
> > > W3C. Probably the easiest thing to do is to agree a schedule for us to 
> > > grab a tar.gz from each of your sites. In theory, the chump is completely 
> > > derrivable; in practice, it'd be a pain to rebuild the site that way. 
> > 
> > Hi, pardon my butting in...
> > 
> > I just wanted to note that grabbing regular .tar.gz's would use
> > much more bandwidth than some other more incremental approach.
> > It would be much more efficient (but slightly more work) to
> > http-crawl each site and just keep a copy of everything we see,
> > if that's ok with you.
> > 
> > (I don't know how much more work it is; I think it's fairly easy
> > to do stuff like this with wget, though I always need to rtfm)
> -- 
> Edd Dumbill ----| phone: +44 1904 427740 fax: +44 8709 909625 |-----
>  | Managing Editor, XML.com, XMLhack.com  --  Chair, XML Europe 2003
>  | I PGP sign my email; more info at http://heddley.com/edd/pgp.html
Received on Friday, 3 January 2003 18:18:23 GMT

This archive was generated by hypermail 2.2.0+W3C-0.50 : Monday, 7 July 2008 08:08:50 GMT