I'm fine with a crawler, you're right about the bandwidth requirements -- especially as each day there are at most 4 files which change in the chump hierarchy (front page, archived day, month, year.) A simple Perl script could mirror the chump stuff quite happily. -- Edd On Fri, 2003-01-03 at 20:32, Gerald Oskoboiny wrote: > * Dan Brickley <danbri@w3.org> [2003-01-03 15:20-0500] > > > > Hi Edd, Dave, > > > > We're looking into making sure #rdfig discussions have an archive at > > W3C. Probably the easiest thing to do is to agree a schedule for us to > > grab a tar.gz from each of your sites. In theory, the chump is completely > > derrivable; in practice, it'd be a pain to rebuild the site that way. > > Hi, pardon my butting in... > > I just wanted to note that grabbing regular .tar.gz's would use > much more bandwidth than some other more incremental approach. > It would be much more efficient (but slightly more work) to > http-crawl each site and just keep a copy of everything we see, > if that's ok with you. > > (I don't know how much more work it is; I think it's fairly easy > to do stuff like this with wget, though I always need to rtfm) -- Edd Dumbill ----| phone: +44 1904 427740 fax: +44 8709 909625 |----- | Managing Editor, XML.com, XMLhack.com -- Chair, XML Europe 2003 | I PGP sign my email; more info at http://heddley.com/edd/pgp.htmlReceived on Friday, 3 January 2003 15:38:14 GMT
This archive was generated by hypermail 2.2.0+W3C-0.50 : Monday, 7 July 2008 08:08:50 GMT