Re: Barewords in on* attributes, redux (also, find() and company) from Simon Pieters on 2011-12-14 (public-webapps@w3.org from October to December 2011)

From: Simon Pieters <simonp@opera.com>
Date: Wed, 14 Dec 2011 09:01:57 +0100
To: public-webapps@w3.org, "Boris Zbarsky" <bzbarsky@mit.edu>
Message-ID: <op.v6gy9jqlidj3kv@simon-pieterss-macbook.local>

On Wed, 14 Dec 2011 08:36:44 +0100, Boris Zbarsky <bzbarsky@mit.edu> wrote:

> John Jensen here at Mozilla has been doing some web crawling trying to  
> find what barewords are used in on* attributes.

Awesome!

> What I have so far as a result is a list of about 1.7 million barewords  
> used across several tens of thousands of pages.

Do you have a more accurate figure for the number of pages?

> If people are interested in the exact methodology, I can probably get a  
> description.

I'm interested. It's hard to make conclusions from data without knowing  
what the data is, how it is biased, what false positives it might have,  
etc.

> I'm working on making sure that it's ok for me to post the data in its  
> entirety so you can all look as well.  Assuming it is (very likely),  
> where's a good place to stick a 7MB compressed file?
>
> In any case, for this particular data set there are no hits on "findAll"  
> or "matches" (good!), but there are two hits on "find" as a bareword in  
> an on* attribute.  Specifically:
>
> 1)  http://otc-pif.rbc.ru/pif_calculator/calculator.jsp has  
> onclick="find(document.getElementById(current + 'List').children,  
> searchString.value)"
>
> 2)  http://bookmark.people.com.cn/index.html has onclick="find()"
>
> These would both obviously get broken by the proposed find() API, unless  
> we actually do some sort of workaround for this problem...
>
> -Boris
>


-- 
Simon Pieters
Opera Software

Received on Wednesday, 14 December 2011 08:02:38 UTC