W3C home > Mailing lists > Public > public-webapps@w3.org > October to December 2011

Re: Barewords in on* attributes, redux (also, find() and company)

From: Simon Pieters <simonp@opera.com>
Date: Wed, 14 Dec 2011 09:48:13 +0100
To: "Boris Zbarsky" <bzbarsky@mit.edu>
Cc: public-webapps@w3.org
Message-ID: <op.v6g1enitidj3kv@simon-pieterss-macbook.local>
On Wed, 14 Dec 2011 09:15:12 +0100, Boris Zbarsky <bzbarsky@mit.edu> wrote:

> On 12/14/11 3:01 AM, Simon Pieters wrote:
>>> What I have so far as a result is a list of about 1.7 million
>>> barewords used across several tens of thousands of pages.
>>
>> Do you have a more accurate figure for the number of pages?
>
> "57,444 unique urls, all taken from the top 21,000 domains" is all the  
> information I have there so far.

Thanks!

>>> If people are interested in the exact methodology, I can probably get
>>> a description.
>>
>> I'm interested. It's hard to make conclusions from data without knowing
>> what the data is, how it is biased, what false positives it might have,
>> etc.
>
> Yeah, understood.  Working on getting that description.
>
> -Boris

cheers
-- 
Simon Pieters
Opera Software
Received on Wednesday, 14 December 2011 08:48:47 GMT

This archive was generated by hypermail 2.3.1 : Tuesday, 26 March 2013 18:49:49 GMT