Re: Barewords in on* attributes, redux (also, find() and company) from Boris Zbarsky on 2011-12-16 (public-webapps@w3.org from October to December 2011)

From: Boris Zbarsky <bzbarsky@MIT.EDU>
Date: Thu, 15 Dec 2011 21:52:00 -0500
To: public-webapps@w3.org
Message-ID: <4EEAB250.3040108@mit.edu>

On 12/14/11 4:52 PM, Boris Zbarsky wrote:
> Ok. It's just a simple spider that starts with the list at
> http://code.google.com/p/httparchive/source/browse/trunk/lists/All.txt
> and for each of those urls loads the url itself and then follows all
> same-host links from that page. So loads the front page of the site and
> all the same-host one-level-deep pages.

One more note.  The data I have so far is from just looking at 1000 
sites, not all 25000-some.  John's still working on that last, now that 
he has this set up on more beefy hardware.

-Boris

Received on Friday, 16 December 2011 02:59:56 UTC