W3C home > Mailing lists > Public > public-tracking@w3.org > February 2012

Issue-5 - Cross Tracking Definition Proposal

From: Kevin Smith <kevsmith@adobe.com>
Date: Mon, 6 Feb 2012 13:29:06 -0800
To: "public-tracking@w3.org" <public-tracking@w3.org>
Message-ID: <6E120BECD1FFF142BC26B61F4D994CF3064CA07AA8@nambx07.corp.adobe.com>
Do Not Track = Do Not Cross Track

Cross Tracking occurs when information collected about a user on one site is used on a different site or combined with data collected about that user from a different site.  Three concerns (I know there are more) that privacy sensitive internet users may have with cross site tracking include:

1.       Cross Site Targeting - When they do something on one site and it clearly affects their experience on a different site (they see ads for an item they looked at on an unrelated site)

2.       Cross Site Profiling -concern that their internet activity and browsing patterns are getting stitched together into profiles that can be used among other things, to target them

3.       Unknown Data Collection - some may be uncomfortable with the idea that sites and companies which they have never heard of, and with which they have never intentionally interacted, may be compiling data about them

Pure Cross Tracking Model
With DNT:1 enabled, 1st parties would not be able to share any of the data they collect with 3rd parties nor would they be able to acquire data about the user from 3rd parties.  Ideally, 3rd parties would not be able to collect any data since under the strictest definition of cross tracking, any data acquired by a 3rd party must have been collected on a site other than its own.  Hopefully this sounds familiar and like a good idea, because this is pretty close to what we have defined in the specs thus far.

Advantages
Conceptually this works well as it alleviates all 3 concerns stated above.  If only 1st parties track, and they do not share it with other entities, then you cannot possibly be targeted on a site based on what you did on a different site.  And if "unknown" 3rd parties have no data at all, then they cannot be building profiles about you.

Disadvantages

         Complicated to define - by making the rules so different for 1st and 3rd parties, it does 2 things immediately.  1) it doubles our work because we now have to define everything twice.  2) It places a lot of importance on who is a 1st party, who is a 3rd party, how they interact, how/whether one can become the other, what they have to do with data collected as one vs the other etc etc.  It's a lot of work

         Complicated to implement - this just happens naturally.  The harder a spec is to understand, the more complex the rules are, the harder it will be to implement, which slows adoption and increases implementation errors, especially for the rare company that may be both 1st and 3rd parties at different times


Site Siloed Model - What Happens in Vegas Stays in Vegas - Nothing that happens on this site can affect your experience on any other site nor can any other site know what you did on this site
This is very similar to the above model, with a few simplifying tweaks.  1st parties play by the same rules as above.  When DNT:1 is enabled, they cannot share data they collect about the user with any other party, nor can they request data from another party about the user.  They only have access to information acquired directly on their own sites.  The simplifying difference in this model is that 3rd parties play by the same rules as 1st parties.  This means that any data a 3rd party collects about a user on a particular site cannot be used, combined, or stored with any data collected from a different site nor can it be used on a different site.  On any particular site, they only have access to information acquired on that particular site.  This is not quite as strong as the above model because it allows some level of 3rd party tracking, but it has some key advantages that may make it worth it

Example:
A user turns on DNT:1 and visits the sites publisher1.com and unrelated publisher2.com.  Both sites use adexchange1.com to serve ads.

         Visitor hits publisher1.com for the 1st time

o   Publisher1 collects some 1st party data about the page view and makes a request to adexchange1.com to serve ads

o   Adexchange1.com has never seen this visitor on publisher1.com before (since it was their 1st visit), and assigns a random internal id of 1001

  Adexchange1.com collects and stores data about the ad request and connects some of it to visitor id 1001

         Visitor hits publisher2.com for the 1st time

o   Publisher2 collects some 1st party data about the page view and makes a request to adexchange1.com to serve ads

o   Adexchange1.com has never seen this visitor on publisher2.com before (they have seen the visitor on publisher2.com, but they do not know it's the same visitor), and assigns a random internal id of 1002

  Adexchange1.com collects and stores data about the ad request and connects some of it to visitor id 1002

         Publisher1.com is not allowed to share any data about this visitor with publisher2.com and vice versa

         Adexchange1.com does not know that visitor 1001 is the same as visitor 1002 and is not allowed to merge the data from those 2 visitors in any way (nor would they want to since they think its 2 different visitors).  We may want to strengthen this with a requirement that PII is not collected when DNT:1 is on so that the adexchange1.com does not later decide to no longer be DNT compliant and figure out that visitor 1001 is the same as visitor 1002

         It is impossible for the user's visitor to publisher1.com to affect their visit to publisher2.com in any way and vice versa

         No cross profiles are created and no cross targeting or tracking occurs

Advantages

         It still addresses the 1st 2 concerns expressed at the top.  No cross-site targeting, and no cross-site profiling.

         Easier to define - Since both parties behave the same, this largely removes the emphasis we place on parties.  We do still have to define the boundaries of a 1st party, but we no longer have to define two systems (one for each party type) nor worry about all the intricate details surrounding 3rd parties and how to tell them apart from 1st parties

         Easier to implement -

o   First the obvious - a standard that is easier to understand is usually easier to implement and you are less likely to get things wrong.  This is roughly the same for 1st parties as the above model.

o   It's hard to say exactly what changes a company will have to make to be compliant without knowing the details of their system.  However, conceptually, all a 3rd party would have to do is change their cookies to separate visitorIDs by site.  They do not need to change their backend because they would automatically treat a visitor which hits 2 sites, as they now treat 2 separate visitors hitting different sites.

         Easy to understand - I am pretty sure I could explain to my mom in 30 seconds what happens when she clicks the DNT checkbox.  When you visit a site, only that site will ever know what did or looked at on the site.  What happens in Vegas...

Disadvantages

         As already stated, this does not eliminate the 3rd concern stated at the top.  3rd parties with whom you did not intend to interact may have data about you.  However, it does mitigate this concern somewhat in the fact that at least those 3rd parties can only use that data while you are on the site on which they obtained it.  This means they cannot compile your browsing patterns.  They can only target you based on your activities on that site (just like the 1st party).  This restriction will remove all incentive for many 3rd parties to track at all.  DSPs, data exchanges etc will either simply not track anything when they see a DNT:1 or they will need to change their business model to provide some level of value on a 1st party only basis after which they will essentially act like a 1st party outsourcing.

Conclusion - A Cost Efficient System
I do not suggest that this is the perfect solution.  I know there are privacy concerns that this does not address.  I do suggest however, that this model is a good starting point.  It makes smart compromises that might get us 80% of the way there for 20% of the work.  I would recommend that we use this model as a baseline to do a cost/benefit analysis of other models.  For example, if we decided to do the 1st model outlined above, the benefit would be that 3rd parties do not get to track when DNT is on, and the cost would be that we have to do all of the work outlined below.


Here is a quick list of the workload that goes away if we take this simplified approach.  I am sure there is more, but this is from a really quick scan:

Exceptions that would be unnecessary under the siloed approach

         1st Party Exception

         1st Party Outsourcing  (3rd party as a 1st party)

         This model would clearly have similar exceptions for operational uses, but would not need many of the other exception since all exceptions would be true exceptions when cross tracking is actually permissible (rate limiting etc).

         Others??

Issues would be irrelevant or at least simplified

         Issue 9: Understand all the different first- and third party cases

         Issue 17: Data use by a 1st party - everyone treated the same

         Issue 19: Data collection / Data use (3rd party) - would be the same as issue 17

         Issue 26: Providing data to 3rd party widgets - does that imply consent?

         Issue 41: Consistent way to discuss tracking with users - simplified - maybe

         Issue 49: 3rd party as 1st party - irrelevant

         Issue 50: Are DNT headers sent to first parties - everyone treated the same

         Issue 51: Should 1st party have any response to DNT signal  - everyone treated the same

         Issue 60: Will a recipient know if it itself is a 1st or 3rd party?<http://www.w3.org/2011/tracking-protection/track/issues/60> - irrelevant

         Issue 73: In order for analytics or other contracting to count as first-party: by contract, by technical silo, both silo and contract<http://www.w3.org/2011/tracking-protection/track/issues/73> - irrelevant

         Issue 77: How does a website determine if it is a first or third party and should this be included in the protocol?<http://www.w3.org/2011/tracking-protection/track/issues/77> - irrelevant

         Issue-89: Does DNT mean at a high level: (a) no customization, users are seen for the first time, every time. (b) DNT is about data moving between sites.<http://www.w3.org/2011/tracking-protection/track/issues/89> - b

         Issue-91: Might want prohibitions on first parties re-selling data to get around the intent of DNT<http://www.w3.org/2011/tracking-protection/track/issues/91> - covered

         Issue-107: Exact format of the response header?  - Tom's response header does not need the 1st bit (which party are you)

Sections of the Compliance doc that become unnecessary or simplified


         3.4 - 1st and 3rd Parties

o   3.4.1

o   3.4.2

o   3.4.2

         3.9 - Meaningful Interaction

         4.1.2 - Compliance by a 3rd Party

         TPE 5.5.1 -Tracking Response Header Field - gets simplified



-kevin


[cid:image001.png@01CCE199.20ED99F0]

Kevin Smith
Engineering Manager
Adobe

385.221.1288 (tel)
kevsmith@adobe.com<mailto:kevsmith@adobe.com>

550 E Timpanogos Cir
Orem, UT, 84097
www.adobe.com<http://www.adobe.com>











image001.png
(image/png attachment: image001.png)

Received on Monday, 6 February 2012 22:02:11 UTC

This archive was generated by hypermail 2.3.1 : Friday, 3 November 2017 21:44:44 UTC