For Brussels: Fwd: cross-site tracking and what it means

Hello all,

A couple of quick points.  The face-to-face working group meeting will  
be in Brussels next week.  John Simpson and I have been meeting with  
folks on the phone about what to expect.  We sent the Community Group  
document to the WG, and the American Civil Liberties Union (ACLU)  
signed on to the document as well (yesterday).

I won't be in Brussels, so unless EFF Technology Projects Director  
Peter Eckersley manages to get there, John Simpson will be presenting  
the Community Group work.

I want to thank everyone who has commented, and also those who are  
attending in Brussels especially!

On a substantive note, you all know there's a lot of talk around  
"first v third parties" and "why don't we call this 'cross-site'  
tracking?"

This recent email to the public WG list does a good job IMHO of  
unpacking many of the issues.

So for those following along, start at the bottom of the email and  
read up --

--David Wainberg explains what he means by "cross-site tracking."

--Then David Singer of Apple (insightfully, IMHO) says this doesn't  
seem to solve the consumer's problem

--Then Kevin Smith of Adobe comments on David Singer.

--Then Jonathan Mayer sums up well.

The big point IMHO is that *you have to define "site" for "cross-site  
tracking" to have meaning*  And that seems to turn back into the party/ 
first v third party issues.

Lee



Begin forwarded message:

> Resent-From: public-tracking@w3.org
> From: Jonathan Mayer <jmayer@stanford.edu>
> Date: January 20, 2012 12:34:22 PM PST
> To: Kevin Smith <kevsmith@adobe.com>
> Cc: David Singer <singer@apple.com>, "public-tracking@w3.org (public-tracking@w3.org 
> )" <public-tracking@w3.org>
> Subject: Re: cross-site tracking and what it means
>
> This is the clearest articulation I've seen of what "cross-site  
> tracking" might mean.  Thanks, Kevin.
>
> I would offer three criticisms of the approach.
>
> First, it does nothing to simplify definitions: it requires defining  
> what qualifies for a silo (= party), and it requires defining which  
> silo is applicable in a given context (= first party vs. third  
> party).  In fact, it is trivial to recast the proposal into our  
> current analytical approach: an exception for all data that is  
> siloed per-first party.
>
> Second, as Rigo and David note, the approach relies far too  
> extensively on siloing.  There are myriad effective ways of linking  
> user records that do not share an identifier.  (See all the research  
> my lab and others have done on re-identification and how third  
> parties can identify a user.)  While I'm not overly comfortable with  
> the extent to which the outsourcing exception relies on siloing, at  
> least outsourced services have, in general, greater market  
> incentives to 1) silo anyways, 2) not game silos, and 3) get  
> security right.  Moreover, if an outsourced service does goof on its  
> privacy or security, it may not only lose clients, but it may also  
> face litigation from former clients.
>
> Third, it does not go far enough in addressing consumer privacy  
> risks.  In our proposed non-normative discussion of first vs. third  
> parties, Tom and I identified three motivations for the distinction:  
> user awareness and control of information sharing, market incentives  
> for privacy and security, and collection of data across unrelated  
> websites.  The "cross-site tracking" approach only somewhat  
> mitigates the third concern and does nothing to address the first two.
>
> I am also unsure of the use cases that would justify this approach.   
> Is the notion that ad networks would do per-first party behavioral  
> advertising?  If so, that would seem a step backwards from the  
> current industry self-regulation.
>
> Jonathan
>
> On Jan 19, 2012, at 12:15 AM, Kevin Smith wrote:
>
>> That's not exactly what I was suggesting.  I look forward to next  
>> week when we can explore these options in person with a  
>> whiteboard.  Hopefully we can make a lot of progress.
>>
>> What I am proposing is that if a user has DNT turned on when  
>> visiting a given website both 1st and 3rd parties are allowed to  
>> record a visitor's usage on that site as long as it is only  
>> connected, stored, used (etc etc) with that website.  So, the 3rd  
>> party would know that you visited a 1st party, but would not know  
>> that you had ever visited another 1st party site.  It is not simply  
>> another tag on the data, they must actually store the data under  
>> separate visitor ids so that they cannot tell you are the same  
>> visitor -- ie they CANNOT stitch your profile together.
>>
>> Example
>> * A person visits Site A and Site B with DNT turned ON.
>> * Both Site A and Site B call out to Example3rdParty.com.
>> * When the person visits Site A, Example3rdParty.com assigns them a  
>> visitorID of 101.  All profile data that is collected on Site A for  
>> this visitor is attached to visitorID 101.
>> * When the person visits site B, Example3rdParty.com assigns them a  
>> visitorID of 102 and all data it collects on that site is only  
>> associated with visitorID 102.
>> * Example3rdParty.com does not know that visitorID 102 is the same  
>> person as visitorID 101 (at least not on server-side) and so cannot  
>> aggregate the data at a later time.
>>
>> This is essentially how 1st Party Outsourcing behaves under our  
>> current definitions.  To address your 3 specified concerns:
>>
>>> My problems are
>>> *  this is a usage restriction which is easily (accidentally or  
>>> deliberately) dropped. The correlation and aggregation could  
>>> happen at any time.
>>
>> This is a valid concern, but I do not think it's exacerbated by  
>> this approach.  If data is correctly siloed, data should not be  
>> able to accidentally be correctly aggregated.  And I think all  
>> approaches are susceptible to deceptive behavior.
>>
>>> *  I believe that 3rd parties remembering which 1st parties I  
>>> chose to visit is, prima facie, cross-site, and should be  
>>> excluded, not permitted.
>>
>> This does allow a 3rd party to know that you visited A 1st party,  
>> but not multiple 1st parties.  And since they can only use that  
>> data ON that 1st party site, it does not seem like Cross Site  
>> tracking to me.  Again, see 1st Party Outsourcing.
>>
>>> *  this is very close to a previous idea, that DNT didn't control  
>>> tracking at all, just the presentation of behavioral advertising;  
>>> the same database was being built, just the symptoms hidden from  
>>> the users.
>>
>> I don't think this is accurate.  Collection, storage and usage  
>> would be regulated.  The database would not be the same.  It may  
>> have similar raw data, but it would be missing all of the  
>> aggregated, correlated data.
>>
>> Hopefully that makes sense.
>>
>>
>>
>>
>>
>> -----Original Message-----
>> From: David Singer [mailto:singer@apple.com]
>> Sent: Wednesday, January 18, 2012 6:01 PM
>> To: public-tracking@w3.org (public-tracking@w3.org)
>> Subject: cross-site tracking and what it means
>>
>> David, Kevin, thanks
>>
>> I read through this and some other background material.
>>
>> I share the unease about the difficulty of defining 1st and 3rd  
>> parties, and would love to find a way to eliminate that distinction  
>> and apply uniform rules.  But, if I understand it correctly, what  
>> you and Kevin are saying is not, I think, satisfactory.  But I may  
>> mis-understand.  Let me work through it, in case I am off base.
>>
>> As I understand it, you're saying that
>> * the sites I visit can remember anything about the nature and  
>> content of the visits I make to them (currently described as 1st  
>> party)
>> * the sites that those sites 'pull in' (3rd parties, in our current  
>> terms) can remember
>> + NOT ONLY the fact that I pulled content from them, and that it  
>> was me
>> + BUT ALSO that it was because of visits to various other, ("1st  
>> party") sites ('he visited cnn.com and we showed him a book ad;  
>> bbc.com and we showed a soap ad')
>>
>> As far as I can tell, you seem to propose that the 3rd parties can  
>> collect all the same data as today, with the sole exception that  
>> the records have an extra tag on them -- whether they were  
>> collected under DNT or not -- and that the records collected under  
>> DNT have to be segregated and not correlated with the others.
>>
>> My problems are
>> *  this is a usage restriction which is easily (accidentally or  
>> deliberately) dropped. The correlation and aggregation could happen  
>> at any time.
>> *  I believe that 3rd parties remembering which 1st parties I chose  
>> to visit is, prima facie, cross-site, and should be excluded, not  
>> permitted.
>> *  this is very close to a previous idea, that DNT didn't control  
>> tracking at all, just the presentation of behavioral advertising;  
>> the same database was being built, just the symptoms hidden from  
>> the users.
>>
>> Now, I may have misunderstood.  But if I haven't, this doesn't  
>> address my concern as a consumer: I do not want organizations I did  
>> not choose to interact with, and whose very identity is usually  
>> hidden from me, building databases about me. That's tracking.  I  
>> don't think this meets "treat me as someone about whom you know  
>> nothing and remember nothing".
>>
>> If we were to say that *every* site, under DNT must not remember  
>> anything about my interaction with any other site than itself (and  
>> that rules out 3rd parties keeping records that identify the 1st  
>> party, as well), that *might* get closer.  Now the advertising site  
>> can do frequency capping (it remembers what ads it previously  
>> showed me) but not behavioral tracking (it does not remember I  
>> visited CNN, BBC and Amazon, and does not remember what I read or  
>> bought on those sites).  But this needs a lot of working through,  
>> and I am not hopeful it actually comes out simpler than the 1st/3rd  
>> distinction.
>>
>> On Jan 17, 2012, at 8:22 , David Wainberg wrote:
>>
>>> Kevin circulated some great materials and discussion on this back  
>>> in December: http://lists.w3.org/Archives/Public/public-tracking/2011Dec/0051.html 
>>>  and http://lists.w3.org/Archives/Public/public-tracking/2011Dec/0127.html 
>>> .
>>>
>>> But I'm happy to take a stab at explaining how I see it.
>>>
>>> In defining 1st vs 3rd, and saying DNT doesn't, for the most part,  
>>> apply to 1st parties, are we saying that 1st parties have an  
>>> exception to engage in [cross-site] tracking, or are we saying 1st  
>>> party data collection, by definition, is not [cross-site]  
>>> tracking? There seems to be, if not consensus, at least widespread  
>>> agreement that the concern of this standard (the "Do Not" of DNT)  
>>> is something along the lines of the collection and accumulation of  
>>> data about internet users' web browsing history across (unrelated  
>>> | unaffiliated | non-commonly branded | ??)  websites. I don't  
>>> think we mean that 1st parties are free to engage in [cross-site]  
>>> tracking, but rather that once it's cross-site, it's no longer 1st  
>>> party. There may be parties who have consent to track across sites  
>>> by virtue of their 1st party relationship with the user, but is  
>>> there such a thing as 1st party cross-site tracking? Let's assume  
>>> we can acheive a defition of cross-site tracking, do you imagine  
>>> 1st and 3rd parties would be treated differently under the  
>>> standard? I don't imagine so, though 1st parties will have  
>>> different opportunities for acquiring users' consent.
>>>
>>> One might then think that the 1st/3rd party distinction and "cross- 
>>> site" are equivalent. But I would argue they're not, for at least  
>>> the following. First, defining cross-site tracking is closer to  
>>> the problem we're trying to solve, and that's generally a good  
>>> thing. By tailoring our definitions to the actual problems we are  
>>> trying to solve, we reduce the risk of being overinclusive,  
>>> creating ambiguity, or creating unintended consequences.
>>>
>>> Additionally, although we will still need to define cross-site  
>>> tracking, I think that's an easier problem to solve and will be  
>>> easier for all parties to implement. Parties can be lots of  
>>> things. It's impossible to account for all the different  
>>> relationships between parties and users, and parties and parties,  
>>> and so on. Cross-site tracking data is a much more constrained  
>>> set, so will be that much easier to put a definition around.
>>>
>>> By taking the cross-site approach, DNT becomes as simple as:
>>>
>>> 1. Cross-site tracking = X
>>> 2. If DNT == 1, X may not be done, except:
>>>  a. with consent; or
>>>  b. for these purposes: [...]
>>>
>>> Some of the benefits:
>>> - Relies simply on a clear definition of the data collection and  
>>> use practices DNT is concerned with, rather than a multi-step  
>>> process of determining party status and then covered collection  
>>> and use.
>>> - Removes the step of determining 1st vs 3rd party status in any  
>>> given circumstance, and then possibly having separate compliance  
>>> paths for each.
>>> - Saves us from defining 1st vs 3rd parties, and thus eliminates  
>>> having to deal with edge cases like widgets and URL shorteners.
>>> - Solves the 3rd party as agent problem: if it's not cross-site,  
>>> it's not covered.
>>>
>>>
>>>
>>>> [snip]
>>>>
>>
>> David Singer
>> Multimedia and Software Standards, Apple Inc.
>>
>>
>>
>
>

Received on Saturday, 21 January 2012 17:06:14 UTC