- From: Edward W. Felten <felten@CS.Princeton.EDU>
- Date: Thu, 11 Jul 2013 10:11:07 -0400
- To: Justin Brookman <jbrookman@cdt.org>
- Cc: Shane Wiley <wileys@yahoo-inc.com>, "public-tracking@w3.org WG" <public-tracking@w3.org>
- Message-ID: <CANZBoGgbuVoJ3VTG4UvawBxyhbw3Vpit8yYvvmDHL1y7XwCp8A@mail.gmail.com>
Justin, If I'm understanding you correctly, your reading of the DAA text is that the retargeting example I gave would be allowed or not, based on whether the initial information I viewed was available from more than one URL? If so, doesn't that create a loophole, in which a company can make accesses to a page be outside the scope of DNT, simply by offering the same content from a second URL? On Thu, Jul 11, 2013 at 9:44 AM, Justin Brookman <jbrookman@cdt.org> wrote: > Well, this is dependent upon whether the group decides to allow first > parties to use data in a third-party context (via a service provider or > otherwise). Alan Chapell and John Simpson have offered language that would > prevent that, and Yianni has offered text saying that the content must be > branded to avoid confusion. Moreover, the current DAA proposal reads "If a > first party receives a DNT:1 signal, the first party MAY engage in its > collection and use of information WITHIN THE FIRST PARTY CONTEXT." (caps > language added by DAA amendment). So I think Shane's reading is textually > debatable. > > On the other hand, I think retargeting would still be allowed in most > cases (as in Ed's example), since a company could use the fact that I > looked at a particular pair of shoes so long as those shoes are not > exclusive to one .url. If the shoes are available at multiple .urls, > collection/usage/selling/retention/publication of the fact that a user > looked at shoes is allowed since that's not tracking. > > Justin Brookman > Director, Consumer Privacy > Center for Democracy & Technology > tel 202.407.8812 > justin@cdt.org > http://www.cdt.org > @JustinBrookman > @CenDemTech > > On Jul 11, 2013, at 8:03 AM, Shane Wiley <wileys@yahoo-inc.com> wrote: > > Jules,**** > > Retargeting is a function of a Service Provider – not expressly cross-site > activity as in behavioral advertising – in this context. So the data is > collected and used only on the behalf of the 1st party.**** > > - Shane**** > > *From:* Jules Polonetsky [mailto:julespol@futureofprivacy.org] > *Sent:* Thursday, July 11, 2013 12:54 PM > *To:* Paul Ohm > *Cc:* Shane Wiley; Jonathan Mayer; public-tracking@w3.org > *Subject:* Re: URLS/scoring**** > ** ** > So retargeting is restricted here, since it often indicates a visit to one > URL or a limited # of designated URLs?**** > (Apologies if I missed that amidst the flurry)**** > > Jules Polonetsky **** > Facebook.com/FutureofPrivacy**** > @JulesPolonetsky**** > ** ** > > > On Jul 11, 2013, at 5:48 AM, Paul Ohm <paul.ohm@colorado.edu> wrote:**** > > Would it be considered tracking if a particular cookie was scored high in > the category, "visited one of these two particular URLs," because it could > not possibly be reverse engineered to a single URL? > > And if "visited one of these two particular URLs" is some how considered > tracking, what in any of the various draft spec texts leads us to that > conclusion? > > And if "visited one of these two particular URLs" is considered tracking, > what about "visited one of these ten particular URLs"? Or "visited one of > these 100 particular URLs"? > > In other words, is there a k-anonymity floor operating here? If so, what > is k? > > On 7/10/2013 5:55 PM, Shane Wiley wrote:**** > > Fair point Jonathan – and something I had expected we’d be able to provide > more clarity around in non-normative text. The center point **text** is > the definition of Tracking. As long as the resulting transformation to the > ID or the URL was something that could not be reverse engineered back to > the original ID and/or URL, then I would defend this as the information no > longer resulting in tracking.**** > **** > For example, if a collected activity for cookie ID 1234 was obfuscated to > a single letter, then we’d have 26 possible buckets with no way of linking > a single aggregated result to an actual URL.**** > **** > Cookie ID 1234, > http://www.carmaker.com/2013/trucks/sportedition.html?username=Shane**** > -becomes-**** > Cookie ID 1234, “c”, 1**** > **** > Similarly…**** > **** > Cookie ID 1234, > http://www.candlesplus.com/aromacenter/vaniall.php?account_id=Wiley**** > -becomes-**** > Cookie ID 1234, “c”, 2**** > **** > While difficult to predefine in technical terms, as long as the resulting > “aggregate” doesn’t allow for reverse engineering back to the actual event, > then tracking is not occurring.**** > **** > ROT13 doesn’t work (character rotation of 13 places) as this can be > reverse engineered directly and wouldn’t be able to be contained through > administrative and operational controls. That’s why we’ve recommended > something more significant such as keyed/secret hash where the key is > further contained from access outside of automated routines – aka, humans – > as a more reasonable option (but there could be others that meet the same > goal).**** > **** > - Shane **** > **** > *From:* Jonathan Mayer [mailto:jmayer@stanford.edu <jmayer@stanford.edu>] > *Sent:* Wednesday, July 10, 2013 11:55 PM > *To:* Shane Wiley > *Cc:* Lauren Gelman; Peter Swire; Justin Brookman; Rob van Eijk; Mike > O'Neill; public-tracking@w3.org > *Subject:* Re: URLS/scoring**** > **** > Shane,**** > **** > Could you please identify the **text** that limits these exceptions from > "tracking"? Once a URL is altered to something other than a plaintext URL > (e.g. applying ROT13), why is it still "tracking"?**** > **** > Thanks,**** > Jonathan**** > **** > > On Wednesday, July 10, 2013 at 3:34 PM, Shane Wiley wrote:**** > > Lauren,**** > **** > I’m not following your “translation from English to Spanish” example as > for the Aggregate Scoring approach would be more akin to summarizing > English into basic sounds – of which could be attributed to any number of > words but in of themselves does not reveal the actual word the sound > belongs to.**** > **** > - Shane**** > **** > *From:* Lauren Gelman [mailto:gelman@blurryedge.com<gelman@blurryedge.com> > ] > *Sent:* Wednesday, July 10, 2013 7:47 PM > *To:* Peter Swire > *Cc:* Jonathan Mayer; Shane Wiley; Justin Brookman; Rob van Eijk; Mike > O'Neill; public-tracking@w3.org > *Subject:* Re: URLS/scoring**** > **** > **** > The change proposed to limit the definition of tracking to URLs is > extraordinary.**** > **** > Business works this way anyway-- URLS are translated into segments and > people are characterized using those. Segments and profiles are augmented > and targeted to. Not lists of URLs **** > **** > I thought it was crazy a year ago when the compromise was made for DNT:1 > to permit collecting of information, in order to accommodate (IMHO broad) > permitted uses. If collection is permitted in order to allow the business > to translate the URL into a segment, the exception has indeed, finally, > swallowed the rule. **** > **** > Allowing aggregate scoring is just like translating english URLs to > spanish and then saying the spanish ones are out of scope. It ignores the > fact that if you collect multiple data points about a unique identifier, > you can eventually determine it's personal characteristics. There's no > reason that is limited to URLS, but applies equally to any translated > characteristics.**** > **** > Lauren Gelman**** > @laurengelman**** > BlurryEdge Strategies > 415-627-8512**** > **** > On Jul 10, 2013, at 11:14 AM, Peter Swire wrote:**** > > **** > Please correct me if I'm wrong.**** > **** > My understanding is that "aggregate scoring" is not "tracking."**** > **** > It therefore does not qualify either as "de-identified" or "de-linked." > It is outside the scope of DNT under the DAA proposal.**** > **** > Peter**** > **** > **** > **** > Prof. Peter P. Swire**** > C. William O'Neill Professor of Law**** > Ohio State University**** > 240.994.4142**** > www.peterswire.net**** > **** > Beginning August 2013:**** > Nancy J. and Lawrence P. Huang Professor**** > Law and Ethics Program**** > Scheller College of Business**** > Georgia Institute of Technology**** > **** > **** > *From: *Jonathan Mayer <jmayer@stanford.edu> > *Date: *Wednesday, July 10, 2013 12:40 PM > *To: *Shane Wiley <wileys@yahoo-inc.com> > *Cc: *Justin Brookman <jbrookman@cdt.org>, Rob van Eijk <rob@blaeu.com>, > Mike O'Neill <michael.oneill@baycloud.com>, "public-tracking@w3.org" < > public-tracking@w3.org> > *Subject: *Re: URLS/scoring > *Resent-From: *<public-tracking@w3.org> > *Resent-Date: *Wednesday, July 10, 2013 12:40 PM**** > **** > Shane,**** > **** > Could you please explain where "Aggregate Scoring" would land in the DAA > proposal? Is it "de-identified" data? "Unlinked" data?**** > **** > Thanks,**** > Jonathan**** > **** > > On Wednesday, July 10, 2013 at 9:11 AM, Shane Wiley wrote:**** > > Justin,**** > **** > It was my hope to add this as non-normative text as Aggregate Scoring is > one example of “not tracking” and we’ve been focused on normative text at > this point so that’s why it’s not included.**** > **** > - Shane**** > **** > *From:* Justin Brookman [mailto:jbrookman@cdt.org <jbrookman@cdt.org>] > *Sent:* Wednesday, July 10, 2013 4:40 PM > *To:* Rob van Eijk > *Cc:* Mike O'Neill; Shane Wiley; public-tracking@w3.org > *Subject:* Re: URLS/scoring**** > **** > I had heard the idea floated in Sunnyvale (and before) but it was only > presented as a possibility --- in any event, scoring certainly ran counter > to the previous requirements in the compliance standard. Mike Zaneis's > comments last week were the first time I thought I understood that the > trade associations were proposing that OBA/retargeting be allowed when DNT > is turned on. And in any event, prior discussions are not really relevant > --- I'm just trying to figure out concretely what is on the table as far as > the DAA proposed DNT standard.**** > **** > Jack's proposed revision of the definition of tracking helped me (I think) > to understand what is being offered, but I was just trying to flesh it out. > People keep referencing "scoring," but that term is neither defined nor > used in any of the proposals.**** > **** > On Jul 10, 2013, at 11:33 AM, Rob van Eijk <rob@blaeu.com> wrote:**** > > ** ** > Justin, currently aggregated scoring happens parallel from R-Y-G, and is > not part of the proposal. In Santa Clara Shane made it clear that all > users, regardless of DNT will be subject to aggregated scoring. Only an > opt-out cookie MAY prevent this collection, use and sharing. > > Rob**** > Justin Brookman <jbrookman@cdt.org> wrote:**** > To be clear, I do not believe that the term "aggregate scoring" appears > either in the original DAA proposal or the amendments that Jack sent around > yesterday. As I currently think I understand the proposal, when DNT:1 is > turned on, a third party may not use/retain the specific url/domain for OBA > (or other non-permitted purposes), but they may use/retain any derived > information about the url.**** > **** > So an ad network may not retain/use the fact that I visited > zappos.com/32145 for OBA (or other non-permitted purposes) but they may > retain/use/sell/do anything with a characterization of my unique ID as > "interested in shopping," "interested in shoes," or "interested in the Nike > Pro Attack in blue and green." The unique ID could be a cookie, an email > address, a name, or anything else.**** > **** > Justin Brookman > Director, Consumer Privacy > Center for Democracy & Technology > tel 202.407.8812 > justin@cdt.org > http://www.cdt.org > @JustinBrookman > @CenDemTech**** > **** > On Jul 10, 2013, at 11:15 AM, "Mike O'Neill" <michael.oneill@baycloud.com> > wrote:**** > > ** ** > [Keep ID, Remove URL = Aggregate Scoring] is a null**** > **** > Because the individual is still profiled and their web activity can > continue to be appended to the profile**** > **** > **** > **** > [Remove ID, Keep URL] is a null**** > **** > Because a) PII might be in URLs.**** > **** > b) In reality ID has been replaced with an equivalent, > though different, ID’ so web activity can continue to be appended.**** > **** > **** > *From:* Shane Wiley [mailto:wileys <wileys>@yahoo-inc.com] > *Sent:* 10 July 2013 15:42 > *To:* Mike O'Neill > *Cc:* public-tracking@w3.org > *Subject:* RE: issue-199**** > **** > **** > Mike,**** > **** > **** > I support verifiability but am challenged with technical mechanisms to > allow this without breaking corporate confidentiality concerns. This is > why I call it out as an area for future development to help build solutions > to this unique problem.**** > **** > **** > I’ve tried breaking the proposal down to the simplest form I can think > of. Let me know if this makes it more clear:**** > **** > **** > -----**** > **** > If Tracking = ID + URLs, then Not Tracking = ID <> URL**** > **** > **** > Keep ID, Remove URL Aggregate Scoring**** > **** > Remove ID, Keep URL De-Identification**** > **** > **** > Remove ID, Remove URL De-Identification + De-Linking (now out of scope of > DNT)**** > **** > -----**** > **** > **** > - Shane**** > **** > **** > *From:* Mike O'Neill [mailto:michael.oneill@baycloud.com<michael.oneill@baycloud.com> > ] > *Sent:* Wednesday, July 10, 2013 3:10 PM > *To:* Shane Wiley > *Cc:* public-tracking@w3.org > *Subject:* RE: issue-199**** > **** > **** > Shane,**** > **** > **** > I have not missed key points, and know the DAA proposals mean continued > profiling, just think that needs to be made clear. Perhaps you could give > an example where applying a hash to a UID would be useful.**** > **** > **** > There is not much difference between the retention of a profile ba! sed on > algorithmically examining a web history and the actual web history itself. > Both can be a basis for discrimination.**** > **** > **** > My point about verifiability is that without it, with only administrative > and operation controls, there will be inevitably be demands for intrusive > regulation, which will not be good for industry. Verifiability is in fact > quite easy to ensure if tracking is constrained to cookies or even > localStorage, and that is all the more reason to rule out tracking by other > means such as fingerprinting.**** > **** > **** > Mike**** > **** > **** > **** > *From:* Shane Wiley [mailto:wileys@yahoo-inc.com <wileys@yahoo-inc.com>] > *Sent:* 10 July 2013 14:36 > *To:* Mike O'Neill > *Cc:* public-tracking@w3.org > *Subject:* RE: issue-199**** > **** > **** > Mike,**** > **** > **** > Perhaps you’ve not been on the calls as I believe you’ve missed a few of > the key points of this discussion. I won’t be able to provide a full > recount via email but I’ll try to hit the high points for you:**** > **** > **** > 1. It’s understood obfuscation comes with some risk and will need to > be bundled with operational and administrative controls to reach a > reasonable confidence that data will not reverse engineered. For example, > data in the yellow state is not shared publically and/or with parties where > you don’! t feel could protect the security of its composition. While > we’ve agreed on transparency in this area – no one has requested external > verifiability to date which I believe would be somewhat impossible as a > starting point. Perhaps something to work on as a future goal (I believe > the EFF would also be interested in innovating techniques in this area – is > that fair Lee?).**** > **** > 2. Agg! regate scoring will result in a profile. The proposal does > not attempt to remove this concept but instead to ensure the result doesn’t > include a user’s historical cross-site activity. This should not be > confused with de-identification and instead is simply another method to > meet the goal of “not tracking”.**** > **** > **** > - Shane**** > **** > **** > *From:* Mike O'Neill [mailto:michael.oneill@baycloud.com<michael.oneill@baycloud.com> > ] > *Sent:* Wednesday, July 10, 2013 2:02 PM > *To:* Shane Wiley > *Cc:* public-tracking@w3.org > *Subject:* RE: issue-199**** > **** > **** > Shane,**** > **** > **** > As an example of why this “obfuscation” is pointless let it be a simple > substitution cypher! so my UID (which happens to be “123456”) is turned > into “987654”. If I visit a website containing a reference to adco.com that > server recognises me because the UID contains “123456” and builds up a > profile about me. They apply the transform to the UID and always get the > unique value “987654”. which is stored in the profiling dataset. When I > visit other websites that also contain references toadco.com the same > process is repeated and my web activity is appended to the dataset, again > using “987654” as a key.**** > **** > **** > It makes no difference how complex the UID transformation is, as long as > it is 1to1.**** > **** > **** > Under the “DAA proposal” rules there is absolutely no diminution of adco’s > ability to profile me.**** > **** > **** > If another party gets hold of the dataset they can also see my profile, > though not my original UID. If further records are shared they can be > connected to me by this other party because they have the same “987654” > UID. They may not be able to connect records containing “123456” to me > (unless they can crack the cypher or are given the key) but what would be > the point? If they have access to those data records they can already > profile me anyway.**** > **** > **** > If activity data in the dataset, collected with my consent, contains other > PII about me, such as my name, post code, website history etc. they should > obfuscate that, perhaps using one way hash functions or aggregated scoring > algorithms. Since these datasets are a valuable corporate asset you would > expect them to be doing that anyway, but in any case that is legally > required in the EU.**** > **** > **** > As the Snowden revelations have highlighted “operational and > administrative controls” need to be closely monitored. In the case of > security services this can be (has to be) through impeccable judicial > process under democratic oversight. This would not be appropriate for > commercial companies in a competitive environment, so transparent technical > procedures are necessary.**** > **** > **** > The “yellow” state should be recognisable to users and others though > inspection of user agent data or web logs.**** > **** > **** > Mike**** > **** > **** > **** > *From:* Shane Wiley [mailto:wileys@yahoo-inc.com <wileys@yahoo-inc.com>] > *Sent:* 10 July 2013 12:14 > *To:* Mike O'Neill > *Cc:* public-tracking@w3.org > *Subject:* RE: issue-199**** > **** > **** > Mike,**** > **** > **** > I respectfully disagree. Obfuscating the ID breaks the association with > the actual user/device. That said, I agree this has the risk of being > reversed so a blend of technical, operational, and administrative controls > must be brought to bear to keep this from occurring.**** > **** > **** > De-identification doesn’t allow for profiling in a manner that could > affect a user’s experience (no way to get back to the user). **** > **** > **** > Do Not Track can be achieved by breaking the link between a unique ID a! > nd cross-site activity (URLs) – and this could result in a profile of the > user’s interest resulting from aggregate scoring – but this would not allow > a user’s historical activity to be retrieved.**** > **** > **** > - Shane**** > **** > **** > *From:* Mike O'Neill [mailto:michael.oneill@baycloud.com<michael.oneill@baycloud.com> > ] > *Sent:* Wednesday, July 10, 2013 11:55 AM > *To:* Shane Wiley > *Cc:* public-tracking@w3.org > *Subject:* RE: issue-199**** > **** > **** > Hi Shane,**** > **** > **** > How can it be possible to remove the association between a device and a > UID other than deleting it or ensuring it is deleted by the UA after a > short duration. If the UID is there (and present in every tran! sport level > request if it is in a cookie) it uniquely points to the device where it is > stored or derived. This identity is available to the receiving server as > well as any actor with similar access to the data stream or the same > document origin.**** > **** > **** > If you transform the UID in retained data by setting it to another UID > (say by using a hash function), this does not break the association because > there is a 1to1 mapping. There is no practical point in doing it.**** > **** > **** > De-identified data can only be classed as such if there is no linkage. The > “yellow” state can be imagined as an intermediate stage before > de-identification but is only relevant for permitted uses (such as the > detection of unique visitors for analytics or frequency capping), and there > is no need for it to exist for more than a few hours.**** > **** > **** > If we end up defining de-identified as including the ability to link > individuals to a profile it would be a travesty, and people will see > through it. The arms race has already started with an explosion of blunt > cookie and script blockers. If there is not a sensible response to people’s > real privacy concerns the usefulness of the web (and consequently the > profitability of many business models) will be severely diminished.**** > **** > **** > Mike**** > **** > **** > **** > *From:* Shane Wiley [mailto:wileys@yahoo-inc.com <wileys@yahoo-inc.com>] > *Sent:* 09 July 2013 19:30 > *To:* Mike O'Neill; 'achapell'; npdoty@w3.org; tlr@w3.org > *Cc:* public-tracking@w3.org; jeff@democraticmedia.org > *Subject:* RE: issue-199**** > **** > **** > Mike,**** > **** > **** > Deidentification is about removing the association between a unique ID > (any source: cookie, digital fingerprint, etc.) and the actual/specific > user/device. In this context:**** > **** > **** > Red: actual user/device**** > **** > Yellow: not actual user/device but events are linkable (and only usable > for analytics/reporting)**** > **** > Green: not actual user/device and events are not linkable (outside the > scope of DNT)**** > **** > **** > - Shane**** > **** > **** > *From:* Mike O'Neill [mailto:michael.oneill@baycloud.com<michael.oneill@baycloud.com> > ] > *Sent:* Sunday, June 30, 2013 3:01 PM > *To:* 'achapell'; npdoty@w3.org; tlr@w3.org > *Cc:* public-tracking@w3.org; jeff@democraticmedia.org > *Subject:* RE: issue-199**** > **** > **** > Alan,**** > **** > **** > Persistent identifiers and their duration should be discussed as part of > the red/yellow/green permitted use debate. Browser fingerprinting > identifiers are qualitatively different from those stored in cookies or > localStorage because they are effectively infinite in duration, so I > thought it best to extend the defs. to make that clear.**** > **** > **** > **** > Mike**** > **** > **** > **** > *From:* achapell [mailto:achapell@chapellassociates.com<achapell@chapellassociates.com> > ] > *Sent:* 30 June 2013 22:39 > *To:* michael.oneill@baycloud.com; npdoty@w3.org; tlr@w3.org > *Cc:* public-tracking@w3.org; jeff@democraticmedia.org > *Subject:* RE: issue-199**** > **** > **** > Do we want to specify technologies here? **** > **** > **** > **** > Cheers, > > Alan Chapell > 917 318 8440**** > **** > > > > -------- Original message -------- > From: Mike O'Neill <michael.oneill@baycloud.com> > Date: 06/30/2013 3:33 PM (GMT-05:00) > To: Nicholas Doty <npdoty@w3.org>,tlr@w3.org > Cc: public-tracking@w3.org,jeff@democraticmedia.org > Subject: issue-199**** > **** > Nick, Thomas**** > **** > Dr Dix’s letter reminded me that we need to have some reference to browser > fingerprinting being ruled out when DNT is set. I have amended the > definitions accordingly.**** > **** > Do you want me to modify the wiki?**** > **** > **** > **** > A *persistent identifier* is an arbitrary value held in, or derived from > o! ther data in, the user agent whose purpose is to identify the user agent > in subsequent transactions to a particular web domain. It may be encoded > for example as the name or value attribute of an HTTP cookie, as an item in > localStorage or recorded in some way in the cache.**** > **** > The *duration* of a persistent identifier is the maximum period of time > it will be retained in the user agent. This could be implemented for > example using the Expires or Max-Age attributes of an HTTP cookie so that > it is automatically deleted by the user agent after the specified time > period is exceeded.**** > **** > *Browser* *fingerprinting*! is a method of tracking based on creating a > persistent identifier from other information either inherent in the content > request or already stored in the user agent. Such an identifier may not > need itself to be stored in the user-agent as it can be calculated again in > subsequent transactions. It follows from this that its duration is > effectively unlimited.**** > **** > *Justification.***** > **** > *With the duration definition, restrictions on permitted uses could then > be made that limit the duration of persistent identifiers.* *Because* *browser > fingerprinting* *cannot! be given a finite duration this tracking method > should not be used when DNT is set even if it is for a permitted use.* *In > reality browser fingerprinting solely based on examining initial content > requests is usually not an effective tracking method because the > combination of IP addresses and other headers are not sufficiently user > specific, but we should rule out at least the more complex form when DNT is > set.***** > **** > Mike**** > **** > **** > **** > > **** > **** > > **** > > ** ** > > > -- Edward W. Felten Professor of Computer Science and Public Affairs Director, Center for Information Technology Policy Princeton University 609-258-5906 http://www.cs.princeton.edu/~felten
Received on Thursday, 11 July 2013 14:11:56 UTC