Re: cross-site tracking and what it means from David Wainberg on 2012-01-23 (public-tracking@w3.org from January 2012)

From: David Wainberg <dwainberg@appnexus.com>
Date: Mon, 23 Jan 2012 11:25:24 -0500
To: Jonathan Robert Mayer <jmayer@stanford.edu>
CC: Kevin Smith <kevsmith@adobe.com>, David Singer <singer@apple.com>, "public-tracking@w3.org (public-tracking@w3.org)" <public-tracking@w3.org>
Message-ID: <4F1D89F4.6010600@appnexus.com>

On 1/20/12 9:18 PM, Jonathan Robert Mayer wrote:
> On Jan 20, 2012, at 5:23 PM, David Wainberg<dwainberg@appnexus.com>  wrote:
>
>> I disagree. I think it requires only defining "cross-site" and "cross-site tracking". Once data collected across sites is combined, it becomes "cross-site". This makes it very simple. I understand there will be some interest in guidelines around the adequate segregation of the data.
> You've left out the definition of "site," which in your meaning subsumes the "party" and "first party" components of our current approach.
>
> I think the most productive way forward is to stop having this conversation in the abstract. We've seen how the current analytical framework operates; I'd ask "cross-site tracking" proponents to prepare detailed analysis of a few use cases for Brussels.
Yes, it's true that "site" will need definition, but my point is that's 
easier than party.

Unfortunately, I am unable to attend in Brussels, but I'll try to put 
something together on the list before the meeting.
>>> Second, as Rigo and David note, the approach relies far too extensively on siloing.  There are myriad effective ways of linking user records that do not share an identifier.  (See all the research my lab and others have done on re-identification and how third parties can identify a user.)  While I'm not overly comfortable with the extent to which the outsourcing exception relies on siloing, at least outsourced services have, in general, greater market incentives to 1) silo anyways, 2) not game silos, and 3) get security right.  Moreover, if an outsourced service does goof on its privacy or security, it may not only lose clients, but it may also face litigation from former clients.
>> We cannot solve this whole problem with DNT. Bad actors will do bad things, regardless of DNT. But one thing we can do with DNT is to create incentives for minimizing data collected and retained. Again, this is a reason to focus more on data than on usage.
> I don't follow. My very criticism was that siloing - especially usage-based siloing - isn't enough.
I thought you were in favor of siloing as a solution to the 
outsourcing/analytics issues, provided there were sufficient controls in 
place. My point was that we cannot fully solve the re-identification  or 
compilation problem. Since most of what DNT will be asking parties to do 
is not testable externally, we have to rely on companies to be good. The 
bad ones will do bad things, regardless of what we put in the spec.
>>> Third, it does not go far enough in addressing consumer privacy risks.  In our proposed non-normative discussion of first vs. third parties, Tom and I identified three motivations for the distinction: user awareness and control of information sharing, market incentives for privacy and security, and collection of data across unrelated websites.  The "cross-site tracking" approach only somewhat mitigates the third concern and does nothing to address the first two.
>> Can you explain this further? Why does "cross-site tracking" not solve these problems?
> Even assuming data could be perfectly or near-perfectly siloed per-first party - and it can't - per-first party data can be quite revealing (and identifiable). For example: Your reading history on a newspaper site or your queries on a health community site. (My lab has seen identifying information leakage from websites in both of these categories, by the way.)
I thought we have wide agreement that data about "first party behavior", 
i.e. behavior confined to a single site is not of concern here as long 
it's collection/use is siloed to prevent use on other sites or 
compilation with data from other sites. But also, this goes to the 
particulars of the type of data that is collected. There's a big 
difference between "This user read an article about cancer" vs "This 
user has an interest in health and medicine."

Received on Monday, 23 January 2012 16:25:54 UTC