Re: Barewords in on* attributes, redux (also, find() and company)

On 12/14/11 3:01 AM, Simon Pieters wrote:
>> What I have so far as a result is a list of about 1.7 million
>> barewords used across several tens of thousands of pages.
>
> Do you have a more accurate figure for the number of pages?

"57,444 unique urls, all taken from the top 21,000 domains" is all the 
information I have there so far.

>> If people are interested in the exact methodology, I can probably get
>> a description.
>
> I'm interested. It's hard to make conclusions from data without knowing
> what the data is, how it is biased, what false positives it might have,
> etc.

Yeah, understood.  Working on getting that description.

-Boris

Received on Wednesday, 14 December 2011 08:15:52 UTC