RE: cross-site tracking and what it means from Kevin Smith on 2012-01-22 (public-tracking@w3.org from January 2012)

From: Kevin Smith <kevsmith@adobe.com>
Date: Sun, 22 Jan 2012 12:24:44 -0800
To: Jonathan Mayer <jmayer@stanford.edu>
CC: David Singer <singer@apple.com>, "public-tracking@w3.org (public-tracking@w3.org)" <public-tracking@w3.org>
Message-ID: <6E120BECD1FFF142BC26B61F4D994CF3064C8B5F70@nambx07.corp.adobe.com>
Good debate.  Answers inline.

-----Original Message-----
From: Jonathan Mayer [mailto:jmayer@stanford.edu]
Sent: Friday, January 20, 2012 1:34 PM
To: Kevin Smith
Cc: David Singer; public-tracking@w3.org (public-tracking@w3.org)
Subject: Re: cross-site tracking and what it means

This is the clearest articulation I've seen of what "cross-site tracking" might mean.  Thanks, Kevin.

I would offer three criticisms of the approach.

> First, it does nothing to simplify definitions: it requires defining what qualifies for a silo (= party), and it requires defining which silo is applicable in a given context (= first party vs. third party).  In fact, it is trivial to recast the proposal into our current analytical approach: an exception for all data that is siloed per-first party.

You are right in that it does not get rid of all 'party' discussion, but I think it does get rid of most, and the most difficult.  You do have to define the silo which is our current 1st party discussion around branding, affiliation etc.  However, since both 1st and 3rd parties share the same siloing rules, you do not have to define anything about 3rd parties.  Examples of discussions we no longer need to have would include (but definitely not be limited to):
* when does a 3rd party become a 1st party (and vice versa)
* widget discussion
* should their be different syntax/responses for 1st and 3rd party responses
* exemption discussions get easier (no 1st party exemption, no 1st party outsourcing exemption etc) - all exemptions are truly examples of when "cross tracking" is permissible, not simply tracking - like those mentioned above
* does a 1st party need to talk to its 3rd parties during the request
* if you want a full list, just look down the issues list.  I did it once, and it seemed like over 1/3rd of the issues go away, or are extremely simplified (to be fair, I am sure some new ones will be introduced as well, but I think far fewer than are removed)


>Second, as Rigo and David note, the approach relies far too extensively on siloing.  There are myriad effective ways of linking user records that do not share an identifier.  (See all the research my lab and others have done on re-identification and how third parties can identify a user.)  While I'm not overly comfortable with the extent to which the outsourcing exception relies on siloing, at least outsourced services have, in general, greater market incentives to 1) silo anyways, 2) not game silos, and 3) get security right.  Moreover, if an outsourced service does goof on its privacy or security, it may not only lose clients, but it may also face litigation from former clients.

This seems like a completely moot point to me.  Siloed data means that you cannot easily "join" the data using standard queries or scripts.  I am sure it may be possible  to do some sort of extensive fingerprinting to approximate aggregation.  However, I don't really understand the incentive for a company to claim they are DNT compliant, go to the trouble of separating their data, structuring into siloes, and removing the connecting ids - then remerging the data again using more obscure, less reliable methods.  In addition, I don't see how this is any less secure than the party method.  A bad actor could just as easily claim to be DNT compliant and then store the data anyway.  Or using the cross tracking method, they could claim to be DNT compliant and then NOT actually silo the data - that would be a lot easier.  The point is, we don't get to see their database.  We will always be taking their word for it.  The only way to prove they are violating DNT (using any of the proposed methods) is in their output - catch them actually targeting you with data they should not have - and even that is difficult because there are lots of ways to target that are not actually intrusive - it would have to be pretty specific.

>Third, it does not go far enough in addressing consumer privacy risks.  In our proposed non-normative discussion of first vs. third parties, Tom and I identified three motivations for the distinction: user awareness and control of information sharing, market incentives for privacy and security, and collection of data across unrelated websites.  The "cross-site tracking" approach only somewhat mitigates the third concern and does nothing to address the first two.

I agree with David here.  I am not entirely sure what you mean.  The two approaches have nearly identical results as far as the consumer is concerned so how would the party approach address these concerns, but the cross tracking approach fall short?

>I am also unsure of the use cases that would justify this approach.  Is the notion that ad networks would do per-first party behavioral advertising?  If so, that would seem a step backwards from the current industry self-regulation.

The use case is simplicity and implementability (if that's not a word, it should be).  The two approaches meet the same objectives.  I was fully on board with the party approach until it became so muddy and complex.  It seemed like nearly all our discussions focused on the differences between 1st and 3rd party and very few of them were actually about DNT anymore (at least not directly).  I think this approach is more straightforward to discuss, define, and most importantly, to implement because at know point to you have to care what party you are.



On Jan 19, 2012, at 12:15 AM, Kevin Smith wrote:

> That's not exactly what I was suggesting.  I look forward to next week when we can explore these options in person with a whiteboard.  Hopefully we can make a lot of progress.
>
> What I am proposing is that if a user has DNT turned on when visiting a given website both 1st and 3rd parties are allowed to record a visitor's usage on that site as long as it is only connected, stored, used (etc etc) with that website.  So, the 3rd party would know that you visited a 1st party, but would not know that you had ever visited another 1st party site.  It is not simply another tag on the data, they must actually store the data under separate visitor ids so that they cannot tell you are the same visitor -- ie they CANNOT stitch your profile together.
>
> Example
> * A person visits Site A and Site B with DNT turned ON.
> * Both Site A and Site B call out to Example3rdParty.com.
> * When the person visits Site A, Example3rdParty.com assigns them a visitorID of 101.  All profile data that is collected on Site A for this visitor is attached to visitorID 101.
> * When the person visits site B, Example3rdParty.com assigns them a visitorID of 102 and all data it collects on that site is only associated with visitorID 102.
> * Example3rdParty.com does not know that visitorID 102 is the same person as visitorID 101 (at least not on server-side) and so cannot aggregate the data at a later time.
>
> This is essentially how 1st Party Outsourcing behaves under our current definitions.  To address your 3 specified concerns:
>
>> My problems are
>> *  this is a usage restriction which is easily (accidentally or deliberately) dropped. The correlation and aggregation could happen at any time.
>
> This is a valid concern, but I do not think it's exacerbated by this approach.  If data is correctly siloed, data should not be able to accidentally be correctly aggregated.  And I think all approaches are susceptible to deceptive behavior.
>
>> *  I believe that 3rd parties remembering which 1st parties I chose to visit is, prima facie, cross-site, and should be excluded, not permitted.
>
> This does allow a 3rd party to know that you visited A 1st party, but not multiple 1st parties.  And since they can only use that data ON that 1st party site, it does not seem like Cross Site tracking to me.  Again, see 1st Party Outsourcing.
>
>> *  this is very close to a previous idea, that DNT didn't control tracking at all, just the presentation of behavioral advertising; the same database was being built, just the symptoms hidden from the users.
>
> I don't think this is accurate.  Collection, storage and usage would be regulated.  The database would not be the same.  It may have similar raw data, but it would be missing all of the aggregated, correlated data.
>
> Hopefully that makes sense.
>
>
>
>
>
> -----Original Message-----
> From: David Singer [mailto:singer@apple.com]
> Sent: Wednesday, January 18, 2012 6:01 PM
> To: public-tracking@w3.org (public-tracking@w3.org)
> Subject: cross-site tracking and what it means
>
> David, Kevin, thanks
>
> I read through this and some other background material.
>
> I share the unease about the difficulty of defining 1st and 3rd parties, and would love to find a way to eliminate that distinction and apply uniform rules.  But, if I understand it correctly, what you and Kevin are saying is not, I think, satisfactory.  But I may mis-understand.  Let me work through it, in case I am off base.
>
> As I understand it, you're saying that
> * the sites I visit can remember anything about the nature and content
> of the visits I make to them (currently described as 1st party)
> * the sites that those sites 'pull in' (3rd parties, in our current
> terms) can remember  + NOT ONLY the fact that I pulled content from
> them, and that it was me  + BUT ALSO that it was because of visits to
> various other, ("1st party") sites ('he visited cnn.com and we showed
> him a book ad; bbc.com and we showed a soap ad')
>
> As far as I can tell, you seem to propose that the 3rd parties can collect all the same data as today, with the sole exception that the records have an extra tag on them -- whether they were collected under DNT or not -- and that the records collected under DNT have to be segregated and not correlated with the others.
>
> My problems are
> *  this is a usage restriction which is easily (accidentally or deliberately) dropped. The correlation and aggregation could happen at any time.
> *  I believe that 3rd parties remembering which 1st parties I chose to visit is, prima facie, cross-site, and should be excluded, not permitted.
> *  this is very close to a previous idea, that DNT didn't control tracking at all, just the presentation of behavioral advertising; the same database was being built, just the symptoms hidden from the users.
>
> Now, I may have misunderstood.  But if I haven't, this doesn't address my concern as a consumer: I do not want organizations I did not choose to interact with, and whose very identity is usually hidden from me, building databases about me. That's tracking.  I don't think this meets "treat me as someone about whom you know nothing and remember nothing".
>
> If we were to say that *every* site, under DNT must not remember anything about my interaction with any other site than itself (and that rules out 3rd parties keeping records that identify the 1st party, as well), that *might* get closer.  Now the advertising site can do frequency capping (it remembers what ads it previously showed me) but not behavioral tracking (it does not remember I visited CNN, BBC and Amazon, and does not remember what I read or bought on those sites).  But this needs a lot of working through, and I am not hopeful it actually comes out simpler than the 1st/3rd distinction.
>
> On Jan 17, 2012, at 8:22 , David Wainberg wrote:
>
>> Kevin circulated some great materials and discussion on this back in December: http://lists.w3.org/Archives/Public/public-tracking/2011Dec/0051.html and http://lists.w3.org/Archives/Public/public-tracking/2011Dec/0127.html.
>>
>> But I'm happy to take a stab at explaining how I see it.
>>
>> In defining 1st vs 3rd, and saying DNT doesn't, for the most part, apply to 1st parties, are we saying that 1st parties have an exception to engage in [cross-site] tracking, or are we saying 1st party data collection, by definition, is not [cross-site] tracking? There seems to be, if not consensus, at least widespread agreement that the concern of this standard (the "Do Not" of DNT) is something along the lines of the collection and accumulation of data about internet users' web browsing history across (unrelated | unaffiliated | non-commonly branded | ??)  websites. I don't think we mean that 1st parties are free to engage in [cross-site] tracking, but rather that once it's cross-site, it's no longer 1st party. There may be parties who have consent to track across sites by virtue of their 1st party relationship with the user, but is there such a thing as 1st party cross-site tracking? Let's assume we can acheive a defition of cross-site tracking, do you imagine 1st and 3rd parties would be treated differently under the standard? I don't imagine so, though 1st parties will have different opportunities for acquiring users' consent.
>>
>> One might then think that the 1st/3rd party distinction and "cross-site" are equivalent. But I would argue they're not, for at least the following. First, defining cross-site tracking is closer to the problem we're trying to solve, and that's generally a good thing. By tailoring our definitions to the actual problems we are trying to solve, we reduce the risk of being overinclusive, creating ambiguity, or creating unintended consequences.
>>
>> Additionally, although we will still need to define cross-site tracking, I think that's an easier problem to solve and will be easier for all parties to implement. Parties can be lots of things. It's impossible to account for all the different relationships between parties and users, and parties and parties, and so on. Cross-site tracking data is a much more constrained set, so will be that much easier to put a definition around.
>>
>> By taking the cross-site approach, DNT becomes as simple as:
>>
>> 1. Cross-site tracking = X
>> 2. If DNT == 1, X may not be done, except:
>>   a. with consent; or
>>   b. for these purposes: [...]
>>
>> Some of the benefits:
>> - Relies simply on a clear definition of the data collection and use practices DNT is concerned with, rather than a multi-step process of determining party status and then covered collection and use.
>> - Removes the step of determining 1st vs 3rd party status in any given circumstance, and then possibly having separate compliance paths for each.
>> - Saves us from defining 1st vs 3rd parties, and thus eliminates having to deal with edge cases like widgets and URL shorteners.
>> - Solves the 3rd party as agent problem: if it's not cross-site, it's not covered.
>>
>>
>>
>> On 1/13/12 5:41 PM, David Singer wrote:
>>> In reading a separate thread, I realized that there is a potential issue here over DNT:0.
>>>
>>> A little while back we discussed whether the UA should send a DNT header to the first party.  A number of us argued that it should, even if the first party is exempt: because the first party may care that its third parties are being asked not to track - it might ask for payment in consequence, for example.
>>>
>>> This argument relies on the assumption that DNT is a single 'big switch', either on or off, but the discussion around DNT:0 reveals that people think it may be OK for the UA to send DNT:1 to some sites, and DNT:0 to others.
>>>
>>> So what, then, does the first party get?  DNT:1 if any third party is getting DNT:1, else DNT:0 if all are getting DNT:0?  An average of the DNT values :-) DNT:0.7 ??!
>>>
>>> Am I, as a UA, allowed to mix non-DNT requests into the mix?
>>>
>>>
>>> David Singer
>>> Multimedia and Software Standards, Apple Inc.
>>>
>>>
>
> David Singer
> Multimedia and Software Standards, Apple Inc.
>
>
>
Received on Sunday, 22 January 2012 20:25:23 UTC