- From: Dan Brickley <danbri@w3.org>
- Date: Fri, 3 Jan 2003 18:18:18 -0500
- To: Edd Dumbill <edd@usefulinc.com>
- Cc: Gerald Oskoboiny <gerald@w3.org>, Dave Beckett <dave.beckett@bristol.ac.uk>, connolly@w3.org, www-archive@w3.org
* Edd Dumbill <edd@usefulinc.com> [2003-01-03 20:35+0000] > I'm fine with a crawler, you're right about the bandwidth requirements > -- especially as each day there are at most 4 files which change in the > chump hierarchy (front page, archived day, month, year.) > > A simple Perl script could mirror the chump stuff quite happily. Yep, Gerald's right. I guess I figured the laziest, simplest thing was just grabbing the single tar.gz, but probably best to get the 4 pages instead. Dan > > -- Edd > > On Fri, 2003-01-03 at 20:32, Gerald Oskoboiny wrote: > > * Dan Brickley <danbri@w3.org> [2003-01-03 15:20-0500] > > > > > > Hi Edd, Dave, > > > > > > We're looking into making sure #rdfig discussions have an archive at > > > W3C. Probably the easiest thing to do is to agree a schedule for us to > > > grab a tar.gz from each of your sites. In theory, the chump is completely > > > derrivable; in practice, it'd be a pain to rebuild the site that way. > > > > Hi, pardon my butting in... > > > > I just wanted to note that grabbing regular .tar.gz's would use > > much more bandwidth than some other more incremental approach. > > It would be much more efficient (but slightly more work) to > > http-crawl each site and just keep a copy of everything we see, > > if that's ok with you. > > > > (I don't know how much more work it is; I think it's fairly easy > > to do stuff like this with wget, though I always need to rtfm) > -- > Edd Dumbill ----| phone: +44 1904 427740 fax: +44 8709 909625 |----- > | Managing Editor, XML.com, XMLhack.com -- Chair, XML Europe 2003 > | I PGP sign my email; more info at http://heddley.com/edd/pgp.html
Received on Friday, 3 January 2003 18:18:23 UTC