W3C home > Mailing lists > Public > public-webapps@w3.org > October to December 2011

Re: Barewords in on* attributes, redux (also, find() and company)

From: Boris Zbarsky <bzbarsky@MIT.EDU>
Date: Thu, 15 Dec 2011 21:52:00 -0500
Message-ID: <4EEAB250.3040108@mit.edu>
To: public-webapps@w3.org
On 12/14/11 4:52 PM, Boris Zbarsky wrote:
> Ok. It's just a simple spider that starts with the list at
> http://code.google.com/p/httparchive/source/browse/trunk/lists/All.txt
> and for each of those urls loads the url itself and then follows all
> same-host links from that page. So loads the front page of the site and
> all the same-host one-level-deep pages.

One more note.  The data I have so far is from just looking at 1000 
sites, not all 25000-some.  John's still working on that last, now that 
he has this set up on more beefy hardware.

-Boris
Received on Friday, 16 December 2011 02:59:56 GMT

This archive was generated by hypermail 2.3.1 : Tuesday, 26 March 2013 18:49:49 GMT