Re: URLS/scoring

Justin, currently aggregated scoring happens parallel from R-Y-G, and is not part of the proposal. In Santa Clara Shane made it clear that all users, regardless of DNT will be subject to aggregated scoring. Only an opt-out cookie MAY prevent this collection, use and sharing.

Rob

Justin Brookman <jbrookman@cdt.org> wrote:

>To be clear, I do not believe that the term "aggregate scoring" appears
>either in the original DAA proposal or the amendments that Jack sent
>around yesterday.  As I currently think I understand the proposal, when
>DNT:1 is turned on, a third party may not use/retain the specific
>url/domain for OBA (or other non-permitted purposes), but they may
>use/retain any derived information about the url.
>
>So an ad network may not retain/use the fact that I visited
>zappos.com/32145 for OBA (or other non-permitted purposes) but they may
>retain/use/sell/do anything with a characterization of my unique ID as
>"interested in shopping," "interested in shoes," or "interested in the
>Nike Pro Attack in blue and green."  The unique ID could be a cookie,
>an email address, a name, or anything else.
>
>Justin Brookman
>Director, Consumer Privacy
>Center for Democracy & Technology
>tel 202.407.8812
>justin@cdt.org
>http://www.cdt.org
>@JustinBrookman
>@CenDemTech
>
>On Jul 10, 2013, at 11:15 AM, "Mike O'Neill"
><michael.oneill@baycloud.com> wrote:
>
>> [Keep ID, Remove URL = Aggregate Scoring] is a null
>> Because the individual is still profiled and their web activity can
>continue to be appended to the profile
>>  
>> [Remove ID, Keep URL]  is a null
>> Because a) PII might be in URLs.
>>                  b) In reality ID has been replaced with an
>equivalent, though different,  ID’ so web activity can continue to be
>appended.
>>  
>> From: Shane Wiley [mailto:wileys@yahoo-inc.com] 
>> Sent: 10 July 2013 15:42
>> To: Mike O'Neill
>> Cc: public-tracking@w3.org
>> Subject: RE: issue-199
>>  
>> Mike,
>>  
>> I support verifiability but am challenged with technical mechanisms
>to allow this without breaking corporate confidentiality concerns. 
>This is why I call it out as an area for future development to help
>build solutions to this unique problem.
>>  
>> I’ve tried breaking the proposal down to the simplest form I can
>think of.  Let me know if this makes it more clear:
>>  
>> -----
>> If Tracking = ID + URLs, then Not Tracking = ID <> URL
>>  
>> Keep ID, Remove URL = Aggregate Scoring
>> Remove ID, Keep URL = De-Identification
>>  
>> Remove ID, Remove URL = De-Identification + De-Linking  (now out of
>scope of DNT)
>> -----
>>  
>> - Shane
>>  
>> From: Mike O'Neill [mailto:michael.oneill@baycloud.com] 
>> Sent: Wednesday, July 10, 2013 3:10 PM
>> To: Shane Wiley
>> Cc: public-tracking@w3.org
>> Subject: RE: issue-199
>>  
>> Shane,
>>  
>> I have not missed key points, and know the DAA proposals mean
>continued profiling, just think that needs to be made clear. Perhaps
>you could give an example where applying a hash to a UID would be
>useful.
>>  
>> There is not much difference between the retention of a profile based
>on algorithmically examining a web history and the actual web history
>itself. Both can be a basis for discrimination.
>>  
>> My point about verifiability is that without it, with only
>administrative and operation controls, there will be inevitably be
>demands for intrusive regulation, which will not be good for industry.
>Verifiability is in fact quite easy to ensure if tracking is
>constrained to cookies or even localStorage, and that is all the more
>reason to rule out tracking by other means such as fingerprinting.
>>  
>> Mike
>>  
>>  
>> From: Shane Wiley [mailto:wileys@yahoo-inc.com] 
>> Sent: 10 July 2013 14:36
>> To: Mike O'Neill
>> Cc: public-tracking@w3.org
>> Subject: RE: issue-199
>>  
>> Mike,
>>  
>> Perhaps you’ve not been on the calls as I believe you’ve missed a few
>of the key points of this discussion.  I won’t be able to provide a
>full recount via email but I’ll try to hit the high points for you:
>>  
>> 1.      It’s understood obfuscation comes with some risk and will
>need to be bundled with operational and administrative controls to
>reach a reasonable confidence that data will not reverse engineered. 
>For example, data in the yellow state is not shared publically and/or
>with parties where you don’t feel could protect the security of its
>composition.  While we’ve agreed on transparency in this area – no one
>has requested external verifiability to date which I believe would be
>somewhat impossible as a starting point.  Perhaps something to work on
>as a future goal (I believe the EFF would also be interested in
>innovating techniques in this area – is that fair Lee?).
>> 2.      Aggregate scoring will result in a profile.  The proposal
>does not attempt to remove this concept but instead to ensure the
>result doesn’t include a user’s historical cross-site activity.  This
>should not be confused with de-identification and instead is simply
>another method to meet the goal of “not tracking”.
>>  
>> - Shane
>>  
>> From: Mike O'Neill [mailto:michael.oneill@baycloud.com] 
>> Sent: Wednesday, July 10, 2013 2:02 PM
>> To: Shane Wiley
>> Cc: public-tracking@w3.org
>> Subject: RE: issue-199
>>  
>> Shane,
>>  
>> As an example of why this “obfuscation” is pointless let it be a
>simple substitution cypher so my UID (which happens to be “123456”) is
>turned into “987654”. If I visit a website containing a reference to
>adco.com that server recognises me because the UID contains “123456”
>and builds up a profile about me. They apply the transform to the UID
>and always get the unique value  “987654”. which is stored in the
>profiling dataset. When I visit other websites that also contain
>references toadco.com the same process is repeated and my web activity
>is appended to the dataset, again using “987654” as a key.
>>  
>> It makes no difference how complex  the UID transformation  is, as
>long as it is 1to1.
>>  
>> Under the “DAA proposal” rules there is absolutely no diminution of
>adco’s ability to profile me.
>>  
>> If another party gets hold of the dataset they can also see my
>profile, though not my original UID. If further records are shared they
>can be connected  to me by this other party because they have the same
>“987654” UID. They may not be able to connect records containing
>“123456” to me (unless they can crack the cypher or are given the key)
>but what would be the point? If they have access to those data records
>they can already profile me anyway.
>>  
>> If activity data in the dataset, collected with my consent, contains
>other PII about me, such as my name, post code, website history etc. 
>they should obfuscate that, perhaps using one way hash functions or
>aggregated scoring algorithms. Since these datasets are a valuable
>corporate asset you would expect them to be doing that anyway, but in
>any case that is legally required in the EU.
>>  
>> As the Snowden revelations have highlighted “operational and
>administrative controls” need to be closely monitored. In the case of
>security services this can be (has to be) through impeccable judicial
>process under democratic oversight. This would not be appropriate for
>commercial companies in a competitive environment, so transparent
>technical procedures are necessary.
>>  
>> The “yellow” state should be recognisable to users and others though
>inspection of user agent data or web logs.
>>  
>> Mike
>>  
>>  
>> From: Shane Wiley [mailto:wileys@yahoo-inc.com] 
>> Sent: 10 July 2013 12:14
>> To: Mike O'Neill
>> Cc: public-tracking@w3.org
>> Subject: RE: issue-199
>>  
>> Mike,
>>  
>> I respectfully disagree.  Obfuscating the ID breaks the association
>with the actual user/device.  That said, I agree this has the risk of
>being reversed so a blend of technical, operational, and administrative
>controls must be brought to bear to keep this from occurring.
>>  
>> De-identification doesn’t allow for profiling in a manner that could
>affect a user’s experience (no way to get back to the user). 
>>  
>> Do Not Track can be achieved by breaking the link between a unique ID
>and cross-site activity (URLs) – and this could result in a profile of
>the user’s interest resulting from aggregate scoring – but this would
>not allow a user’s historical activity to be retrieved.
>>  
>> - Shane
>>  
>> From: Mike O'Neill [mailto:michael.oneill@baycloud.com] 
>> Sent: Wednesday, July 10, 2013 11:55 AM
>> To: Shane Wiley
>> Cc: public-tracking@w3.org
>> Subject: RE: issue-199
>>  
>> Hi Shane,
>>  
>> How can it be possible to remove the association between a device and
>a UID other than deleting it or ensuring it is deleted by the UA after
>a short duration. If the UID is there (and present in every transport
>level request if it is in a cookie) it uniquely points to the device
>where it is stored or derived. This identity is available to the
>receiving server as well as any actor with similar access to the data
>stream or the same document origin.
>>  
>> If you transform the UID in retained data by setting it to another
>UID (say by using a hash function), this does not break the association
>because there is a 1to1 mapping. There is no practical point in doing
>it.
>>  
>> De-identified data can only be classed as such if there is no
>linkage. The “yellow” state can be imagined as an intermediate stage
>before de-identification but is only relevant for permitted uses (such
>as the detection of unique visitors for analytics or frequency
>capping), and there is no need for it to exist for more than a few
>hours.
>>  
>> If we end up defining de-identified as including the ability to link
>individuals to a profile it would be a travesty, and people will see
>through it. The arms race has already started with an explosion of
>blunt cookie and script blockers. If there is not a sensible response
>to people’s real privacy concerns the usefulness of the web (and
>consequently the profitability of many business models) will be
>severely diminished.
>>  
>> Mike
>>  
>>  
>> From: Shane Wiley [mailto:wileys@yahoo-inc.com] 
>> Sent: 09 July 2013 19:30
>> To: Mike O'Neill; 'achapell'; npdoty@w3.org; tlr@w3.org
>> Cc: public-tracking@w3.org; jeff@democraticmedia.org
>> Subject: RE: issue-199
>>  
>> Mike,
>>  
>> Deidentification is about removing the association between a unique
>ID (any source:  cookie, digital fingerprint, etc.) and the
>actual/specific user/device.  In this context:
>>  
>> Red:  actual user/device
>> Yellow:  not actual user/device but events are linkable (and only
>usable for analytics/reporting)
>> Green:  not actual user/device and events are not linkable (outside
>the scope of DNT)
>>  
>> - Shane
>>  
>> From: Mike O'Neill [mailto:michael.oneill@baycloud.com] 
>> Sent: Sunday, June 30, 2013 3:01 PM
>> To: 'achapell'; npdoty@w3.org; tlr@w3.org
>> Cc: public-tracking@w3.org; jeff@democraticmedia.org
>> Subject: RE: issue-199
>>  
>> Alan,
>>  
>> Persistent identifiers and their duration should be discussed as part
>of the red/yellow/green permitted use debate. Browser fingerprinting
>identifiers are qualitatively different from those stored in cookies or
>localStorage because they are effectively infinite in duration, so I
>thought it best to extend the defs. to make that clear.
>>  
>>  
>> Mike
>>  
>>  
>> From: achapell [mailto:achapell@chapellassociates.com] 
>> Sent: 30 June 2013 22:39
>> To: michael.oneill@baycloud.com; npdoty@w3.org; tlr@w3.org
>> Cc: public-tracking@w3.org; jeff@democraticmedia.org
>> Subject: RE: issue-199
>>  
>> Do we want to specify technologies here?  
>>  
>>  
>> Cheers,
>> 
>> Alan Chapell
>> 917 318 8440
>> 
>> 
>> 
>> -------- Original message --------
>> From: Mike O'Neill <michael.oneill@baycloud.com> 
>> Date: 06/30/2013 3:33 PM (GMT-05:00) 
>> To: Nicholas Doty <npdoty@w3.org>,tlr@w3.org 
>> Cc: public-tracking@w3.org,jeff@democraticmedia.org 
>> Subject: issue-199
>> 
>> Nick, Thomas
>> 
>> Dr Dix’s letter reminded me that we need to have some reference to
>browser fingerprinting being ruled out when DNT is set. I have amended
>the definitions accordingly.
>> 
>> Do you want me to modify the wiki?
>> 
>>  
>> 
>> A persistent identifier is an arbitrary value held in, or derived
>from other data in, the user agent whose purpose is to identify the
>user agent in subsequent transactions to a particular web domain. It
>may be encoded for example as the name or value attribute of an HTTP
>cookie, as an item in localStorage or recorded in some way in the
>cache.
>> 
>> The duration of a persistent identifier is the maximum period of time
>it will be retained in the user agent. This could be implemented for
>example using the Expires or Max-Age attributes of an HTTP cookie so
>that it is automatically deleted by the user agent after the specified
>time period is exceeded.
>> 
>> Browser fingerprinting is a method of tracking based on creating a
>persistent identifier from other information either inherent in the
>content request or already stored in the user agent. Such an identifier
>may not need itself to be stored in the user-agent as it can be
>calculated again in subsequent transactions. It follows from this that
>its duration is effectively unlimited.
>> 
>> Justification.
>> 
>> With the duration definition, restrictions on permitted uses could
>then be made that limit the duration of persistent identifiers. Because
>browser fingerprinting cannot be given a finite duration this tracking
>method should not be used when DNT is set even if it is for a permitted
>use. In reality browser fingerprinting solely based on examining
>initial content requests is usually not an effective tracking method
>because the combination of IP addresses and other headers are not
>sufficiently user specific, but we should rule out at least the more
>complex form when DNT is set.
>> 
>> Mike

Received on Wednesday, 10 July 2013 15:34:32 UTC