RE: Understanding Terms and Services

So what we're all saying here is that analytics and aggregates are complex and context-specific. But companies compile, use and share them for a lot of reasons. Sure, one is selling data to data aggregators. Another is tracking flu epidemics through search terms. Or establishing traffic patterns for the optimal placement of stoplights. Are you saying that, even before we know what valuable information can be gleaned from data, those specific processes have to be listed in a privacy policy that's updated every time a new process is begun (and which we know from testing and experience no one will ever read except maybe Karl here ;-)? 

Can it be enough to say that, if data is aggregated to a level where individuals cannot be identifiable or reverse engineered, for whatever that means for that data set, then we can avoid listing all those reasons? And if despite the anonymisation and aggregation of whatever level, if users are still uncomfortable with being part of the data set, they should be able to indicate so in some easy-to-use manner, be it in a browser setting, a permission set with that company, or a command in your personal data ecosystem. 

-----Original Message-----
From: David Singer [mailto:singer@apple.com] 
Sent: 09 March 2012 18:48
To: Dan Brickley
Cc: Karl Dubost; Chappelle, Kasey, VF-Group; public-privacy (W3C mailing list)
Subject: Re: Understanding Terms and Services


On Mar 9, 2012, at 8:36 , Dan Brickley wrote:

> On 9 Mar 2012, at 17:27, Karl Dubost <karld@opera.com> wrote:
>> Le 9 mars 2012 à 10:52, Chappelle, Kasey, VF-Group a écrit :
>>> there's no real privacy impact on you personally
>> 
>> Define this :)
>> How do you know that?
> 
> The term 'Aggregate' is too vague - 'oh, we aggregated by country',  or even 'by city' isn't universally reassuring.

It's too vague in two important respects:

1) aggregation with what granularity? - as you say.   Zip-code specific, combined with a few other facts, is pretty revealing.

2) More critically, 'aggregation' is ambiguous over (at least) the following two practices
2.a) We keep separate counts for how many visitors fall into various aggregate counters: male/female, age-range, geo area, and so on.  "We had 25,000 men, 2,000 people from San Francisco, and 8,000 over-60's, visit us" does not enable anyone to find out whether there was a single over-60's San Francisco man (except statistically).
2.b) We keep records that, per visitor, record the linkage "an over-60's man from San Francisco", and so on;  we then can derive the counts in 2.a.

2.b is what is used if you want to do product-prediction - "people who bought A also often bought B, C" as the linkage is critical.  But these are now per-user records, and as such, amenable to de-anonymization. For *me* these are anonymized records, not aggregated - I would reserve 'aggregated' for 2.a - but I think that this ambiguity exists.


David Singer
Multimedia and Software Standards, Apple Inc.

Received on Friday, 9 March 2012 19:35:20 UTC