RE: issue-199

Mike,

I support verifiability but am challenged with technical mechanisms to allow this without breaking corporate confidentiality concerns.  This is why I call it out as an area for future development to help build solutions to this unique problem.

I’ve tried breaking the proposal down to the simplest form I can think of.  Let me know if this makes it more clear:

-----
If Tracking = ID + URLs, then Not Tracking = ID <> URL

Keep ID, Remove URL = Aggregate Scoring
Remove ID, Keep URL = De-Identification

Remove ID, Remove URL = De-Identification + De-Linking  (now out of scope of DNT)
-----

- Shane

From: Mike O'Neill [mailto:michael.oneill@baycloud.com]
Sent: Wednesday, July 10, 2013 3:10 PM
To: Shane Wiley
Cc: public-tracking@w3.org
Subject: RE: issue-199

Shane,

I have not missed key points, and know the DAA proposals mean continued profiling, just think that needs to be made clear. Perhaps you could give an example where applying a hash to a UID would be useful.

There is not much difference between the retention of a profile based on algorithmically examining a web history and the actual web history itself. Both can be a basis for discrimination.

My point about verifiability is that without it, with only administrative and operation controls, there will be inevitably be demands for intrusive regulation, which will not be good for industry. Verifiability is in fact quite easy to ensure if tracking is constrained to cookies or even localStorage, and that is all the more reason to rule out tracking by other means such as fingerprinting.

Mike


From: Shane Wiley [mailto:wileys@yahoo-inc.com]
Sent: 10 July 2013 14:36
To: Mike O'Neill
Cc: public-tracking@w3.org<mailto:public-tracking@w3.org>
Subject: RE: issue-199

Mike,

Perhaps you’ve not been on the calls as I believe you’ve missed a few of the key points of this discussion.  I won’t be able to provide a full recount via email but I’ll try to hit the high points for you:


1.       It’s understood obfuscation comes with some risk and will need to be bundled with operational and administrative controls to reach a reasonable confidence that data will not reverse engineered.  For example, data in the yellow state is not shared publically and/or with parties where you don’t feel could protect the security of its composition.  While we’ve agreed on transparency in this area – no one has requested external verifiability to date which I believe would be somewhat impossible as a starting point.  Perhaps something to work on as a future goal (I believe the EFF would also be interested in innovating techniques in this area – is that fair Lee?).

2.       Aggregate scoring will result in a profile.  The proposal does not attempt to remove this concept but instead to ensure the result doesn’t include a user’s historical cross-site activity.  This should not be confused with de-identification and instead is simply another method to meet the goal of “not tracking”.

- Shane

From: Mike O'Neill [mailto:michael.oneill@baycloud.com]
Sent: Wednesday, July 10, 2013 2:02 PM
To: Shane Wiley
Cc: public-tracking@w3.org<mailto:public-tracking@w3.org>
Subject: RE: issue-199

Shane,

As an example of why this “obfuscation” is pointless let it be a simple substitution cypher so my UID (which happens to be “123456”) is turned into “987654”. If I visit a website containing a reference to adco.com that server recognises me because the UID contains “123456” and builds up a profile about me. They apply the transform to the UID and always get the unique value  “987654”. which is stored in the profiling dataset. When I visit other websites that also contain references to adco.com the same process is repeated and my web activity is appended to the dataset, again using “987654” as a key.

It makes no difference how complex  the UID transformation  is, as long as it is 1to1.

Under the “DAA proposal” rules there is absolutely no diminution of adco’s ability to profile me.

If another party gets hold of the dataset they can also see my profile, though not my original UID. If further records are shared they can be connected  to me by this other party because they have the same “987654” UID. They may not be able to connect records containing “123456” to me (unless they can crack the cypher or are given the key) but what would be the point? If they have access to those data records they can already profile me anyway.

If activity data in the dataset, collected with my consent, contains other PII about me, such as my name, post code, website history etc.  they should obfuscate that, perhaps using one way hash functions or aggregated scoring algorithms. Since these datasets are a valuable corporate asset you would expect them to be doing that anyway, but in any case that is legally required in the EU.

As the Snowden revelations have highlighted “operational and administrative controls” need to be closely monitored. In the case of security services this can be (has to be) through impeccable judicial process under democratic oversight. This would not be appropriate for commercial companies in a competitive environment, so transparent technical procedures are necessary.

The “yellow” state should be recognisable to users and others though inspection of user agent data or web logs.

Mike


From: Shane Wiley [mailto:wileys@yahoo-inc.com]
Sent: 10 July 2013 12:14
To: Mike O'Neill
Cc: public-tracking@w3.org<mailto:public-tracking@w3.org>
Subject: RE: issue-199

Mike,

I respectfully disagree.  Obfuscating the ID breaks the association with the actual user/device.  That said, I agree this has the risk of being reversed so a blend of technical, operational, and administrative controls must be brought to bear to keep this from occurring.

De-identification doesn’t allow for profiling in a manner that could affect a user’s experience (no way to get back to the user).

Do Not Track can be achieved by breaking the link between a unique ID and cross-site activity (URLs) – and this could result in a profile of the user’s interest resulting from aggregate scoring – but this would not allow a user’s historical activity to be retrieved.

- Shane

From: Mike O'Neill [mailto:michael.oneill@baycloud.com]
Sent: Wednesday, July 10, 2013 11:55 AM
To: Shane Wiley
Cc: public-tracking@w3.org<mailto:public-tracking@w3.org>
Subject: RE: issue-199

Hi Shane,

How can it be possible to remove the association between a device and a UID other than deleting it or ensuring it is deleted by the UA after a short duration. If the UID is there (and present in every transport level request if it is in a cookie) it uniquely points to the device where it is stored or derived. This identity is available to the receiving server as well as any actor with similar access to the data stream or the same document origin.

If you transform the UID in retained data by setting it to another UID (say by using a hash function), this does not break the association because there is a 1to1 mapping. There is no practical point in doing it.

De-identified data can only be classed as such if there is no linkage. The “yellow” state can be imagined as an intermediate stage before de-identification but is only relevant for permitted uses (such as the detection of unique visitors for analytics or frequency capping), and there is no need for it to exist for more than a few hours.

If we end up defining de-identified as including the ability to link individuals to a profile it would be a travesty, and people will see through it. The arms race has already started with an explosion of blunt cookie and script blockers. If there is not a sensible response to people’s real privacy concerns the usefulness of the web (and consequently the profitability of many business models) will be severely diminished.

Mike


From: Shane Wiley [mailto:wileys@yahoo-inc.com]
Sent: 09 July 2013 19:30
To: Mike O'Neill; 'achapell'; npdoty@w3.org<mailto:npdoty@w3.org>; tlr@w3.org<mailto:tlr@w3.org>
Cc: public-tracking@w3.org<mailto:public-tracking@w3.org>; jeff@democraticmedia.org<mailto:jeff@democraticmedia.org>
Subject: RE: issue-199

Mike,

Deidentification is about removing the association between a unique ID (any source:  cookie, digital fingerprint, etc.) and the actual/specific user/device.  In this context:

Red:  actual user/device
Yellow:  not actual user/device but events are linkable (and only usable for analytics/reporting)
Green:  not actual user/device and events are not linkable (outside the scope of DNT)

- Shane

From: Mike O'Neill [mailto:michael.oneill@baycloud.com]
Sent: Sunday, June 30, 2013 3:01 PM
To: 'achapell'; npdoty@w3.org<mailto:npdoty@w3.org>; tlr@w3.org<mailto:tlr@w3.org>
Cc: public-tracking@w3.org<mailto:public-tracking@w3.org>; jeff@democraticmedia.org<mailto:jeff@democraticmedia.org>
Subject: RE: issue-199

Alan,

Persistent identifiers and their duration should be discussed as part of the red/yellow/green permitted use debate. Browser fingerprinting identifiers are qualitatively different from those stored in cookies or localStorage because they are effectively infinite in duration, so I thought it best to extend the defs. to make that clear.


Mike


From: achapell [mailto:achapell@chapellassociates.com]
Sent: 30 June 2013 22:39
To: michael.oneill@baycloud.com<mailto:michael.oneill@baycloud.com>; npdoty@w3.org<mailto:npdoty@w3.org>; tlr@w3.org<mailto:tlr@w3.org>
Cc: public-tracking@w3.org<mailto:public-tracking@w3.org>; jeff@democraticmedia.org<mailto:jeff@democraticmedia.org>
Subject: RE: issue-199

Do we want to specify technologies here?


Cheers,

Alan Chapell
917 318 8440



-------- Original message --------
From: Mike O'Neill <michael.oneill@baycloud.com<mailto:michael.oneill@baycloud.com>>
Date: 06/30/2013 3:33 PM (GMT-05:00)
To: Nicholas Doty <npdoty@w3.org<mailto:npdoty@w3.org>>,tlr@w3.org
Cc: public-tracking@w3.org,jeff@democraticmedia.org<mailto:public-tracking@w3.org,jeff@democraticmedia.org>
Subject: issue-199

Nick, Thomas

Dr Dix’s letter reminded me that we need to have some reference to browser fingerprinting being ruled out when DNT is set. I have amended the definitions accordingly.

Do you want me to modify the wiki?



A persistent identifier is an arbitrary value held in, or derived from other data in, the user agent whose purpose is to identify the user agent in subsequent transactions to a particular web domain. It may be encoded for example as the name or value attribute of an HTTP cookie, as an item in localStorage or recorded in some way in the cache.

The duration of a persistent identifier is the maximum period of time it will be retained in the user agent. This could be implemented for example using the Expires or Max-Age attributes of an HTTP cookie so that it is automatically deleted by the user agent after the specified time period is exceeded.

Browser fingerprinting is a method of tracking based on creating a persistent identifier from other information either inherent in the content request or already stored in the user agent. Such an identifier may not need itself to be stored in the user-agent as it can be calculated again in subsequent transactions. It follows from this that its duration is effectively unlimited.

Justification.

With the duration definition, restrictions on permitted uses could then be made that limit the duration of persistent identifiers. Because browser fingerprinting cannot be given a finite duration this tracking method should not be used when DNT is set even if it is for a permitted use. In reality browser fingerprinting solely based on examining initial content requests is usually not an effective tracking method because the combination of IP addresses and other headers are not sufficiently user specific, but we should rule out at least the more complex form when DNT is set.
Mike

Received on Wednesday, 10 July 2013 14:45:01 UTC