Re: Adding methods to Element.prototype WAS: [Selectors API 2] Is matchesSelector stable enough to unprefix in implementations?

On Tue, Nov 22, 2011 at 1:04 PM, Boris Zbarsky <bzbarsky@mit.edu> wrote:
> Again, some decent data on what pages actually do in on* handlers would be
> really good.  I have no idea how to get it.  :(

Can't browsers add instrumentation for this?  You have users who have
opted in to sending anonymized data.  So for each user, on a small
percentage of pages, intercept all bare-name property accesses in on*.
 Record the property name, and which object in the scope chain it
wound up resolving to.  Send info back to mothership.  There will be
some perf impact, but it should be no big deal if you only do it a
small percentage of the time for each user.  Of course, it might
require a bunch of work to actually code this kind of thing -- that
I'm not in a position to judge.

Moving forward, this kind of info-gathering will be really essential
for us to figure out how we can change stuff.  Right now we have to be
super-conservative when making changes because we have no idea in
advance what impact they'll have.  This is not a good thing for the
web platform, IMO.

(Aside: If we're just looking at some binary question like whether a
specific name like "matches" is doable, you should be able to do this
even without user opt-in, with no privacy breach.  Just send back
noise with probability (n - 1)/n, and the real value with probability
1/n, for n fairly large (say 100,000).  Then average all the values
together, subtract (n - 1)/n times the mean of the distribution you
picked the noise values from, multiply by n, and you get something
very close to the true average, by the law of large numbers.

E.g., if the data is a bit, send a random bit 99.999% of the time and
the real value 0.001% of the time.  Average all the values, subtract
0.499995, multiply by 100,000, and you have roughly the true average
(error bars easily calculable).  But the bit sent back by any given
user would yield negligible information about that user to either the
browser vendor or an eavesdropper, because it's almost surely noise.
The same approach would work for any value, provided you can come up
with a plausible distribution for the noise -- which is almost
certainly not the case for string values, say.

This would all have to be reviewed by security teams, but it should be
doable in principle.  The advantage is your sample would actually be
representative, which could be important in some cases.)

Received on Wednesday, 23 November 2011 15:04:50 UTC