- From: Boris Zbarsky <bzbarsky@MIT.EDU>
- Date: Wed, 14 Dec 2011 03:15:12 -0500
- To: Simon Pieters <simonp@opera.com>
- CC: public-webapps@w3.org
On 12/14/11 3:01 AM, Simon Pieters wrote: >> What I have so far as a result is a list of about 1.7 million >> barewords used across several tens of thousands of pages. > > Do you have a more accurate figure for the number of pages? "57,444 unique urls, all taken from the top 21,000 domains" is all the information I have there so far. >> If people are interested in the exact methodology, I can probably get >> a description. > > I'm interested. It's hard to make conclusions from data without knowing > what the data is, how it is biased, what false positives it might have, > etc. Yeah, understood. Working on getting that description. -Boris
Received on Wednesday, 14 December 2011 08:15:52 UTC