Re: Barewords in on* attributes, redux (also, find() and company)

On 12/14/11 4:52 PM, Boris Zbarsky wrote:
> Ok. It's just a simple spider that starts with the list at
> http://code.google.com/p/httparchive/source/browse/trunk/lists/All.txt
> and for each of those urls loads the url itself and then follows all
> same-host links from that page. So loads the front page of the site and
> all the same-host one-level-deep pages.

One more note.  The data I have so far is from just looking at 1000 
sites, not all 25000-some.  John's still working on that last, now that 
he has this set up on more beefy hardware.

-Boris

Received on Friday, 16 December 2011 02:59:56 UTC