Re: Dataset hosting

On Tuesday, November 5, 2013 at 3:38 PM, David Newton wrote:

> On Nov 5, 2013, at 10:32 AM, Marcos Caceres <w3c@marcosc.com (mailto:w3c@marcosc.com)> wrote:
>  
> >  
> > On November 5, 2013 at 3:29:33 PM, David Newton (david@davidnewton.ca (mailto:david@davidnewton.ca)) wrote:
> > >  
> > > +1 (if they’ll let us)
> > > Would we also be able to schedule an automated task on their server to regenerate it periodically?
> >  
> >  
> >  
> > I doubt it, as it’s fairly brute force what we are doing. However, we could take turns within the group. I’m happy to do the next batch at the end of Nov.  
> >  
> > Oh, another thing - we should cap the number of sites that we d/l to 100,000k. That way, we can do proper longitudinal studies of the data.  
>  
> I’m assuming you mean 100k, not 100,000k. :)
Argh… yep. Numbers are hard.  
  
> That should be fairly easy to add to the script. Do we want the 100k top sites, which will produce fewer than 100k downloads because of errors, or the first 100k sites we’re able to successfully grab?

I guess the top 100K, just for stability.  

Also,  we should probably set the user agent header to the the iPhone.   
  
--  
Marcos Caceres

Received on Tuesday, 5 November 2013 15:46:44 UTC