- From: Edward W. Felten <felten@CS.Princeton.EDU>
- Date: Wed, 10 Jul 2013 21:02:51 -0400
- To: Shane Wiley <wileys@yahoo-inc.com>
- Cc: Jonathan Mayer <jmayer@stanford.edu>, "public-tracking@w3.org" <public-tracking@w3.org>
- Message-ID: <CANZBoGh5m-3wTPtgNK1G5OPdPxZNVO__-teYJM27a=7Cqy3Ehg@mail.gmail.com>
In this scenario, isn't cookie 1234 still a unique ID connected to a specific user? The cookie ID is still in the specific user's browser and will be sent with subsequent requests. On Wed, Jul 10, 2013 at 7:55 PM, Shane Wiley <wileys@yahoo-inc.com> wrote: > Fair point Jonathan – and something I had expected we’d be able to > provide more clarity around in non-normative text. The center point > **text** is the definition of Tracking. As long as the resulting > transformation to the ID or the URL was something that could not be reverse > engineered back to the original ID and/or URL, then I would defend this as > the information no longer resulting in tracking.**** > > ** ** > > For example, if a collected activity for cookie ID 1234 was obfuscated to > a single letter, then we’d have 26 possible buckets with no way of linking > a single aggregated result to an actual URL.**** > > ** ** > > Cookie ID 1234, > http://www.carmaker.com/2013/trucks/sportedition.html?username=Shane**** > > -becomes-**** > > Cookie ID 1234, “c”, 1**** > > ** ** > > Similarly…**** > > ** ** > > Cookie ID 1234, > http://www.candlesplus.com/aromacenter/vaniall.php?account_id=Wiley **** > > -becomes-**** > > Cookie ID 1234, “c”, 2**** > > ** ** > > While difficult to predefine in technical terms, as long as the resulting > “aggregate” doesn’t allow for reverse engineering back to the actual event, > then tracking is not occurring.**** > > ** ** > > ROT13 doesn’t work (character rotation of 13 places) as this can be > reverse engineered directly and wouldn’t be able to be contained through > administrative and operational controls. That’s why we’ve recommended > something more significant such as keyed/secret hash where the key is > further contained from access outside of automated routines – aka, humans – > as a more reasonable option (but there could be others that meet the same > goal).**** > > ** ** > > - Shane **** > > ** ** > > *From:* Jonathan Mayer [mailto:jmayer@stanford.edu] > *Sent:* Wednesday, July 10, 2013 11:55 PM > *To:* Shane Wiley > *Cc:* Lauren Gelman; Peter Swire; Justin Brookman; Rob van Eijk; Mike > O'Neill; public-tracking@w3.org > *Subject:* Re: URLS/scoring**** > > ** ** > > Shane, **** > > ** ** > > Could you please identify the **text** that limits these exceptions from > "tracking"? Once a URL is altered to something other than a plaintext URL > (e.g. applying ROT13), why is it still "tracking"?**** > > ** ** > > Thanks,**** > > Jonathan**** > > ** ** > > On Wednesday, July 10, 2013 at 3:34 PM, Shane Wiley wrote:**** > > Lauren,**** > > **** > > I’m not following your “translation from English to Spanish” example as > for the Aggregate Scoring approach would be more akin to summarizing > English into basic sounds – of which could be attributed to any number of > words but in of themselves does not reveal the actual word the sound > belongs to.**** > > **** > > - Shane**** > > **** > > *From:* Lauren Gelman [mailto:gelman@blurryedge.com<gelman@blurryedge.com>] > > *Sent:* Wednesday, July 10, 2013 7:47 PM > *To:* Peter Swire > *Cc:* Jonathan Mayer; Shane Wiley; Justin Brookman; Rob van Eijk; Mike > O'Neill; public-tracking@w3.org > *Subject:* Re: URLS/scoring**** > > **** > > **** > > The change proposed to limit the definition of tracking to URLs is > extraordinary.**** > > **** > > Business works this way anyway-- URLS are translated into segments and > people are characterized using those. Segments and profiles are augmented > and targeted to. Not lists of URLs **** > > **** > > I thought it was crazy a year ago when the compromise was made for DNT:1 > to permit collecting of information, in order to accommodate (IMHO broad) > permitted uses. If collection is permitted in order to allow the business > to translate the URL into a segment, the exception has indeed, finally, > swallowed the rule. **** > > **** > > Allowing aggregate scoring is just like translating english URLs to > spanish and then saying the spanish ones are out of scope. It ignores the > fact that if you collect multiple data points about a unique identifier, > you can eventually determine it's personal characteristics. There's no > reason that is limited to URLS, but applies equally to any translated > characteristics.**** > > **** > > Lauren Gelman**** > > @laurengelman**** > > BlurryEdge Strategies > 415-627-8512**** > > **** > > On Jul 10, 2013, at 11:14 AM, Peter Swire wrote:**** > > ** ** > > Please correct me if I'm wrong.**** > > **** > > My understanding is that "aggregate scoring" is not "tracking."**** > > **** > > It therefore does not qualify either as "de-identified" or "de-linked." > It is outside the scope of DNT under the DAA proposal.**** > > **** > > Peter**** > > **** > > **** > > **** > > Prof. Peter P. Swire**** > > C. William O'Neill Professor of Law**** > > Ohio State University**** > > 240.994.4142**** > > www.peterswire.net**** > > **** > > Beginning August 2013:**** > > Nancy J. and Lawrence P. Huang Professor**** > > Law and Ethics Program**** > > Scheller College of Business**** > > Georgia Institute of Technology**** > > **** > > **** > > *From: *Jonathan Mayer <jmayer@stanford.edu> > *Date: *Wednesday, July 10, 2013 12:40 PM > *To: *Shane Wiley <wileys@yahoo-inc.com> > *Cc: *Justin Brookman <jbrookman@cdt.org>, Rob van Eijk <rob@blaeu.com>, > Mike O'Neill <michael.oneill@baycloud.com>, "public-tracking@w3.org" < > public-tracking@w3.org> > *Subject: *Re: URLS/scoring > *Resent-From: *<public-tracking@w3.org> > *Resent-Date: *Wednesday, July 10, 2013 12:40 PM**** > > **** > > Shane, **** > > **** > > Could you please explain where "Aggregate Scoring" would land in the DAA > proposal? Is it "de-identified" data? "Unlinked" data?**** > > **** > > Thanks,**** > > Jonathan**** > > **** > > On Wednesday, July 10, 2013 at 9:11 AM, Shane Wiley wrote:**** > > Justin,**** > > **** > > It was my hope to add this as non-normative text as Aggregate Scoring is > one example of “not tracking” and we’ve been focused on normative text at > this point so that’s why it’s not included.**** > > **** > > - Shane**** > > **** > > *From:* Justin Brookman [mailto:jbrookman@cdt.org <jbrookman@cdt.org>] > *Sent:* Wednesday, July 10, 2013 4:40 PM > *To:* Rob van Eijk > *Cc:* Mike O'Neill; Shane Wiley; public-tracking@w3.org > *Subject:* Re: URLS/scoring**** > > **** > > I had heard the idea floated in Sunnyvale (and before) but it was only > presented as a possibility --- in any event, scoring certainly ran counter > to the previous requirements in the compliance standard. Mike Zaneis's > comments last week were the first time I thought I understood that the > trade associations were proposing that OBA/retargeting be allowed when DNT > is turned on. And in any event, prior discussions are not really relevant > --- I'm just trying to figure out concretely what is on the table as far as > the DAA proposed DNT standard.**** > > **** > > Jack's proposed revision of the definition of tracking helped me (I think) > to understand what is being offered, but I was just trying to flesh it out. > People keep referencing "scoring," but that term is neither defined nor > used in any of the proposals.**** > > **** > > On Jul 10, 2013, at 11:33 AM, Rob van Eijk <rob@blaeu.com> wrote:**** > > > > **** > > Justin, currently aggregated scoring happens parallel from R-Y-G, and is > not part of the proposal. In Santa Clara Shane made it clear that all > users, regardless of DNT will be subject to aggregated scoring. Only an > opt-out cookie MAY prevent this collection, use and sharing. > > Rob**** > > Justin Brookman <jbrookman@cdt.org> wrote:**** > > To be clear, I do not believe that the term "aggregate scoring" appears > either in the original DAA proposal or the amendments that Jack sent around > yesterday. As I currently think I understand the proposal, when DNT:1 is > turned on, a third party may not use/retain the specific url/domain for OBA > (or other non-permitted purposes), but they may use/retain any derived > information about the url.**** > > **** > > So an ad network may not retain/use the fact that I visited > zappos.com/32145 for OBA (or other non-permitted purposes) but they may > retain/use/sell/do anything with a characterization of my unique ID as > "interested in shopping," "interested in shoes," or "interested in the Nike > Pro Attack in blue and green." The unique ID could be a cookie, an email > address, a name, or anything else.**** > > **** > > Justin Brookman > Director, Consumer Privacy > Center for Democracy & Technology > tel 202.407.8812 > justin@cdt.org > http://www.cdt.org > @JustinBrookman > @CenDemTech**** > > **** > > On Jul 10, 2013, at 11:15 AM, "Mike O'Neill" <michael.oneill@baycloud.com> > wrote:**** > > > > **** > > [Keep ID, Remove URL = Aggregate Scoring] is a null**** > > **** > > Because the individual is still profiled and their web activity can > continue to be appended to the profile**** > > **** > > **** > > **** > > [Remove ID, Keep URL] is a null**** > > **** > > Because a) PII might be in URLs.**** > > **** > > b) In reality ID has been replaced with an equivalent, > though different, ID’ so web activity can continue to be appended.**** > > **** > > **** > > *From:* Shane Wiley [mailto:wileys <wileys>@yahoo-inc.com] > *Sent:* 10 July 2013 15:42 > *To:* Mike O'Neill > *Cc:* public-tracking@w3.org > *Subject:* RE: issue-199**** > > **** > > **** > > Mike,**** > > **** > > **** > > I support verifiability but am challenged with technical mechanisms to > allow this without breaking corporate confidentiality concerns. This is > why I call it out as an area for future development to help build solutions > to this unique problem.**** > > **** > > **** > > I’ve tried breaking the proposal down to the simplest form I can think > of. Let me know if this makes it more clear:**** > > **** > > **** > > -----**** > > **** > > If Tracking = ID + URLs, then Not Tracking = ID <> URL**** > > **** > > **** > > Keep ID, Remove URL = Aggregate Scoring**** > > **** > > Remove ID, Keep URL = De-Identification**** > > **** > > **** > > Remove ID, Remove URL = De-Identification + De-Linking (now out of scope > of DNT)**** > > **** > > -----**** > > **** > > **** > > - Shane**** > > **** > > **** > > *From:* Mike O'Neill [mailto:michael.oneill@baycloud.com<michael.oneill@baycloud.com> > ] > *Sent:* Wednesday, July 10, 2013 3:10 PM > *To:* Shane Wiley > *Cc:* public-tracking@w3.org > *Subject:* RE: issue-199**** > > **** > > **** > > Shane,**** > > **** > > **** > > I have not missed key points, and know the DAA proposals mean continued > profiling, just think that needs to be made clear. Perhaps you could give > an example where applying a hash to a UID would be useful.**** > > **** > > **** > > There is not much difference between the retention of a profile ba! sed on > algorithmically examining a web history and the actual web history itself. > Both can be a basis for discrimination.**** > > **** > > **** > > My point about verifiability is that without it, with only administrative > and operation controls, there will be inevitably be demands for intrusive > regulation, which will not be good for industry. Verifiability is in fact > quite easy to ensure if tracking is constrained to cookies or even > localStorage, and that is all the more reason to rule out tracking by other > means such as fingerprinting.**** > > **** > > **** > > Mike**** > > **** > > **** > > **** > > *From:* Shane Wiley [mailto:wileys@yahoo-inc.com <wileys@yahoo-inc.com>] > *Sent:* 10 July 2013 14:36 > *To:* Mike O'Neill > *Cc:* public-tracking@w3.org > *Subject:* RE: issue-199**** > > **** > > **** > > Mike,**** > > **** > > **** > > Perhaps you’ve not been on the calls as I believe you’ve missed a few of > the key points of this discussion. I won’t be able to provide a full > recount via email but I’ll try to hit the high points for you:**** > > **** > > **** > > 1. It’s understood obfuscation comes with some risk and will need to > be bundled with operational and administrative controls to reach a > reasonable confidence that data will not reverse engineered. For example, > data in the yellow state is not shared publically and/or with parties where > you don’! t feel could protect the security of its composition. While > we’ve agreed on transparency in this area – no one has requested external > verifiability to date which I believe would be somewhat impossible as a > starting point. Perhaps something to work on as a future goal (I believe > the EFF would also be interested in innovating techniques in this area – is > that fair Lee?).**** > > **** > > 2. Agg! regate scoring will result in a profile. The proposal does > not attempt to remove this concept but instead to ensure the result doesn’t > include a user’s historical cross-site activity. This should not be > confused with de-identification and instead is simply another method to > meet the goal of “not tracking”.**** > > **** > > **** > > - Shane**** > > **** > > **** > > *From:* Mike O'Neill [mailto:michael.oneill@baycloud.com<michael.oneill@baycloud.com> > ] > *Sent:* Wednesday, July 10, 2013 2:02 PM > *To:* Shane Wiley > *Cc:* public-tracking@w3.org > *Subject:* RE: issue-199**** > > **** > > **** > > Shane,**** > > **** > > **** > > As an example of why this “obfuscation” is pointless let it be a simple > substitution cypher! so my UID (which happens to be “123456”) is turned > into “987654”. If I visit a website containing a reference to adco.com that > server recognises me because the UID contains “123456” and builds up a > profile about me. They apply the transform to the UID and always get the > unique value “987654”. which is stored in the profiling dataset. When I > visit other websites that also contain references toadco.com the same > process is repeated and my web activity is appended to the dataset, again > using “987654” as a key.**** > > **** > > **** > > It makes no difference how complex the UID transformation is, as long as > it is 1to1.**** > > **** > > **** > > Under the “DAA proposal” rules there is absolutely no diminution of adco’s > ability to profile me.**** > > **** > > **** > > If another party gets hold of the dataset they can also see my profile, > though not my original UID. If further records are shared they can be > connected to me by this other party because they have the same “987654” > UID. They may not be able to connect records containing “123456” to me > (unless they can crack the cypher or are given the key) but what would be > the point? If they have access to those data records they can already > profile me anyway.**** > > **** > > **** > > If activity data in the dataset, collected with my consent, contains other > PII about me, such as my name, post code, website history etc. they should > obfuscate that, perhaps using one way hash functions or aggregated scoring > algorithms. Since these datasets are a valuable corporate asset you would > expect them to be doing that anyway, but in any case that is legally > required in the EU.**** > > **** > > **** > > As the Snowden revelations have highlighted “operational and > administrative controls” need to be closely monitored. In the case of > security services this can be (has to be) through impeccable judicial > process under democratic oversight. This would not be appropriate for > commercial companies in a competitive environment, so transparent technical > procedures are necessary.**** > > **** > > **** > > The “yellow” state should be recognisable to users and others though > inspection of user agent data or web logs.**** > > **** > > **** > > Mike**** > > **** > > **** > > **** > > *From:* Shane Wiley [mailto:wileys@yahoo-inc.com <wileys@yahoo-inc.com>] > *Sent:* 10 July 2013 12:14 > *To:* Mike O'Neill > *Cc:* public-tracking@w3.org > *Subject:* RE: issue-199**** > > **** > > **** > > Mike,**** > > **** > > **** > > I respectfully disagree. Obfuscating the ID breaks the association with > the actual user/device. That said, I agree this has the risk of being > reversed so a blend of technical, operational, and administrative controls > must be brought to bear to keep this from occurring.**** > > **** > > **** > > De-identification doesn’t allow for profiling in a manner that could > affect a user’s experience (no way to get back to the user). **** > > **** > > **** > > Do Not Track can be achieved by breaking the link between a unique ID a! > nd cross-site activity (URLs) – and this could result in a profile of the > user’s interest resulting from aggregate scoring – but this would not allow > a user’s historical activity to be retrieved.**** > > **** > > **** > > - Shane**** > > **** > > **** > > *From:* Mike O'Neill [mailto:michael.oneill@baycloud.com<michael.oneill@baycloud.com> > ] > *Sent:* Wednesday, July 10, 2013 11:55 AM > *To:* Shane Wiley > *Cc:* public-tracking@w3.org > *Subject:* RE: issue-199**** > > **** > > **** > > Hi Shane,**** > > **** > > **** > > How can it be possible to remove the association between a device and a > UID other than deleting it or ensuring it is deleted by the UA after a > short duration. If the UID is there (and present in every tran! sport level > request if it is in a cookie) it uniquely points to the device where it is > stored or derived. This identity is available to the receiving server as > well as any actor with similar access to the data stream or the same > document origin.**** > > **** > > **** > > If you transform the UID in retained data by setting it to another UID > (say by using a hash function), this does not break the association because > there is a 1to1 mapping. There is no practical point in doing it.**** > > **** > > **** > > De-identified data can only be classed as such if there is no linkage. The > “yellow” state can be imagined as an intermediate stage before > de-identification but is only relevant for permitted uses (such as the > detection of unique visitors for analytics or frequency capping), and there > is no need for it to exist for more than a few hours.**** > > **** > > **** > > If we end up defining de-identified as including the ability to link > individuals to a profile it would be a travesty, and people will see > through it. The arms race has already started with an explosion of blunt > cookie and script blockers. If there is not a sensible response to people’s > real privacy concerns the usefulness of the web (and consequently the > profitability of many business models) will be severely diminished.**** > > **** > > **** > > Mike**** > > **** > > **** > > **** > > *From:* Shane Wiley [mailto:wileys@yahoo-inc.com <wileys@yahoo-inc.com>] > *Sent:* 09 July 2013 19:30 > *To:* Mike O'Neill; 'achapell'; npdoty@w3.org; tlr@w3.org > *Cc:* public-tracking@w3.org; jeff@democraticmedia.org > *Subject:* RE: issue-199**** > > **** > > **** > > Mike,**** > > **** > > **** > > Deidentification is about removing the association between a unique ID > (any source: cookie, digital fingerprint, etc.) and the actual/specific > user/device. In this context:**** > > **** > > **** > > Red: actual user/device**** > > **** > > Yellow: not actual user/device but events are linkable (and only usable > for analytics/reporting)**** > > **** > > Green: not actual user/device and events are not linkable (outside the > scope of DNT)**** > > **** > > **** > > - Shane**** > > **** > > **** > > *From:* Mike O'Neill [mailto:michael.oneill@baycloud.com<michael.oneill@baycloud.com> > ] > *Sent:* Sunday, June 30, 2013 3:01 PM > *To:* 'achapell'; npdoty@w3.org; tlr@w3.org > *Cc:* public-tracking@w3.org; jeff@democraticmedia.org > *Subject:* RE: issue-199**** > > **** > > **** > > Alan,**** > > **** > > **** > > Persistent identifiers and their duration should be discussed as part of > the red/yellow/green permitted use debate. Browser fingerprinting > identifiers are qualitatively different from those stored in cookies or > localStorage because they are effectively infinite in duration, so I > thought it best to extend the defs. to make that clear.**** > > **** > > **** > > **** > > Mike**** > > **** > > **** > > **** > > *From:* achapell [mailto:achapell@chapellassociates.com<achapell@chapellassociates.com> > ] > *Sent:* 30 June 2013 22:39 > *To:* michael.oneill@baycloud.com; npdoty@w3.org; tlr@w3.org > *Cc:* public-tracking@w3.org; jeff@democraticmedia.org > *Subject:* RE: issue-199**** > > **** > > **** > > Do we want to specify technologies here? **** > > **** > > **** > > **** > > Cheers, > > Alan Chapell > 917 318 8440**** > > **** > > > > > -------- Original message -------- > From: Mike O'Neill <michael.oneill@baycloud.com> > Date: 06/30/2013 3:33 PM (GMT-05:00) > To: Nicholas Doty <npdoty@w3.org>,tlr@w3.org > Cc: public-tracking@w3.org,jeff@democraticmedia.org > Subject: issue-199**** > > **** > > Nick, Thomas**** > > **** > > Dr Dix’s letter reminded me that we need to have some reference to browser > fingerprinting being ruled out when DNT is set. I have amended the > definitions accordingly.**** > > **** > > Do you want me to modify the wiki?**** > > **** > > **** > > **** > > A *persistent identifier* is an arbitrary value held in, or derived from > o! ther data in, the user agent whose purpose is to identify the user agent > in subsequent transactions to a particular web domain. It may be encoded > for example as the name or value attribute of an HTTP cookie, as an item in > localStorage or recorded in some way in the cache.**** > > **** > > The *duration* of a persistent identifier is the maximum period of time > it will be retained in the user agent. This could be implemented for > example using the Expires or Max-Age attributes of an HTTP cookie so that > it is automatically deleted by the user agent after the specified time > period is exceeded.**** > > **** > > *Browser* *fingerprinting*! is a method of tracking based on creating a > persistent identifier from other information either inherent in the content > request or already stored in the user agent. Such an identifier may not > need itself to be stored in the user-agent as it can be calculated again in > subsequent transactions. It follows from this that its duration is > effectively unlimited.**** > > **** > > *Justification.***** > > **** > > *With the duration definition, restrictions on permitted uses could then > be made that limit the duration of persistent identifiers.* *Because* *browser > fingerprinting* *cannot! be given a finite duration this tracking method > should not be used when DNT is set even if it is for a permitted use.* *In > reality browser fingerprinting solely based on examining initial content > requests is usually not an effective tracking method because the > combination of IP addresses and other headers are not sufficiently user > specific, but we should rule out at least the more complex form when DNT is > set.***** > > **** > > Mike**** > > **** > > **** > > **** > > **** > > **** > > ** ** > -- Edward W. Felten Professor of Computer Science and Public Affairs Director, Center for Information Technology Policy Princeton University 609-258-5906 http://www.cs.princeton.edu/~felten
Received on Thursday, 11 July 2013 01:03:40 UTC