Re: [Issue-5] [Action-77] Defining Tunnel-Vision 'Do Not (Cross-Site) Track' from Lauren Gelman on 2012-02-02 (public-tracking@w3.org from February 2012)

From: Lauren Gelman <gelman@blurryedge.com>
Date: Thu, 2 Feb 2012 13:54:26 -0800
To: David Singer <singer@apple.com>
Cc: Bryan Sullivan <blsaws@gmail.com>, "public-tracking@w3.org (public-tracking@w3.org)" <public-tracking@w3.org>
Message-Id: <B313780A-898F-45AE-BF71-738D1DB64396@blurryedge.com>
What does "must be held separately" mean for companies where all the data is in the cloud?  Again, I am trying to figure out how to advise a company to implement this.  

On Feb 2, 2012, at 12:47 AM, David Singer wrote:

> 
> On Feb 2, 2012, at 5:00 , Bryan Sullivan wrote:
> 
>> "Records derived when DNT is on (1), MUST be held separately from other
>> data derived when DNT is not on (1).": By "MUST be held separately", you
>> are not intending to imply any particular technical or physical approach
>> to the "separation" are you?
> 
> No, just that the two databases must be kept distinct (you, before you turned on DNT, and you, after you did).
> 
>> 
>> As noted in other threads, "research/market-analytics" and "product
>> improvement" are important exceptions that depend upon short-term
>> retention of data (typically not PII though) prior to analysis/aggregation.
> 
> This is about separating records that are 'polluted' with full-on tracking, and records that are 'tunnel-vision';  they mustn't be mixed.
> 
>> 
>> Otherwise your definition looks very close.
>> 
>> Thanks,
>> Bryan
>> 
>> On 1/29/12 8:15 AM, "David Singer" <singer@apple.com> wrote:
>> 
>>> This is a revision of my previous email, and a response to Action-77,
>>> which is one of 6 (?) actions related to Issue-5.  Please ask questions
>>> as needed to clarify, and I will write a composite revised definition, so
>>> we can close Action-77, and (once that's been done for the other
>>> formulations) Issue-5.
>>> 
>>> This is an alternative to restricting tracking via a 1st/3rd party
>>> distinction. I want to emphasize, I am doing this to explore and learn,
>>> not to 'promote' any particular direction.  I hope people find it helpful.
>>> 
>>> (All these definitions etc. rely on being able to define "site" or
>>> "party", by the way.  I don't see how to escape that, as many have
>>> pointed out, since it's within a 'party' that information flows, and so
>>> on.)
>>> 
>>> 
>>> RULE
>>> 
>>> Informally, we allow sites only to record what they do and learn
>>> *directly* about the interaction between themselves and the user.
>>> 
>>> The formal rule is this:
>>> 
>>> When DNT is on (1):
>>> Data records that both identify or could identify, a single USER, and
>>> also identify, or could identify, a single SITE (that is part of a Party),
>>> * MUST identify or be capable of identifying no other Party, or site that
>>> is part of any other Party;
>>> * MUST be derived only from transactions directly between the identified
>>> Party and the user, possibly combined with publicly available data,
>>> * MUST be available/accessible only to/by the identified Party,
>>> * MUST NOT contain user-specific non-public information derived or
>>> passed, directly or indirectly, from any other Party,
>>> 
>>> If the data is held by another party on behalf of the identified party,
>>> that holding party MUST have no rights to use the data.
>>> 
>>> Records derived when DNT is on (1), MUST be held separately from other
>>> data derived when DNT is not on (1).
>>> 
>>> EXCEPTIONS
>>> 
>>> not needed:
>>> 
>>> Outsourcing exception: not needed, it's part of the rule in the first
>>> place.
>>> 1st-party exception: not needed: all sites/parties are allowed to
>>> remember the user's interactions with them.
>>> Unidentifiable data exception: not needed, as the definition here only
>>> concerns user-identifiable data in the first place (which can probably be
>>> true for all rule sets)
>>> Operational exceptions:
>>> frequency capping, story-boarding: not needed; the ad site is permitted
>>> to remember what IT served YOU, just not a lot of why (which 1st party
>>> you were on, etc.)
>>> financial logging: separate un-identified records can be kept on the
>>> number of impressions on a 1st-party site (why is this not true for all
>>> proposals?)
>>> 3rd party auditing: again, is it necessary to keep a record that
>>> identifies a specific user?
>>> 
>>> potentially needed:
>>> 
>>> Operational exceptions:
>>> security/fraud: an exception may be needed here, especially if
>>> cross-site fraud is to be detected
>>> research/market-analytics: we don't have a current formulation, and the
>>> title is broad enough to allow almost anything, so I can't tell
>>> product improvement: this is an issue, again with a serious risk of
>>> slippery slope
>>> debugging: yes, an exception may be needed for debugging
>>> Legal exception: tracking to the extent required by law
>>> 
>>> Comments on TUNNEL-VISION
>>> 
>>> If a user runs sometimes with DNT:0 and sometimes DNT:1, they will end up
>>> with two records at sites, one with a lot of other-site data, and the
>>> second record with tunnel-vision.  Correlation by the site would enable
>>> merging these; this is the weakest aspect of this strawman, IMHO.  Under
>>> the alternative 'cross-site' formulation, I think each site would keep
>>> N+1 records (1 for when DNT is off, and N for the number of 1st party
>>> sites 'seen' by this 3rd party for this user).
>>> 
>>> Frequency capping and storyboarding by advertisers are now permitted; you
>>> ARE allowed to remember what ad you showed this (anonymous) user, since
>>> that was *your* transaction.  You're limited in remembering only
>>> site-generic 'why' -- you cannot remember 'they visited Sears and so I
>>> showed a dishwasher advert'.
>>> 
>>> If the user starts interacting with *you*, you can remember that also; we
>>> don't need language to make this an exception, or 'promotion' from 3rd to
>>> 1st party.
>>> 
>>> Redirection services can remember basically only that the user was active
>>> on the web, since everything else they know (the original URL, the
>>> re-direct) either identify or could be used to identify another site.
>>> 
>>> The attraction of this rule is that many fewer exceptions are needed.
>>> The downside of this formulation is that it relies on sites not to
>>> re-correlate the records, though there is still a lot of data that cannot
>>> be recorded.
>>> 
>>> David Singer
>>> Multimedia and Software Standards, Apple Inc.
>>> 
>>> 
>>> 
>> 
>> 
> 
> David Singer
> Multimedia and Software Standards, Apple Inc.
> 
> 

Lauren Gelman
BlurryEdge Strategies
415-627-8512
gelman@blurryedge.com
http://blurryedge.com
Received on Thursday, 2 February 2012 21:54:56 UTC