Re: URLS/scoring

In this scenario, isn't cookie 1234 still a unique ID connected to a
specific user?   The cookie ID is still in the specific user's browser and
will be sent with subsequent requests.



On Wed, Jul 10, 2013 at 7:55 PM, Shane Wiley <wileys@yahoo-inc.com> wrote:

>  Fair point Jonathan – and something I had expected we’d be able to
> provide more clarity around in non-normative text.  The center point
> **text** is the definition of Tracking.  As long as the resulting
> transformation to the ID or the URL was something that could not be reverse
> engineered back to the original ID and/or URL, then I would defend this as
> the information no longer resulting in tracking.****
>
> ** **
>
> For example, if a collected activity for cookie ID 1234 was obfuscated to
> a single letter, then we’d have 26 possible buckets with no way of linking
> a single aggregated result to an actual URL.****
>
> ** **
>
> Cookie ID 1234,
> http://www.carmaker.com/2013/trucks/sportedition.html?username=Shane****
>
> -becomes-****
>
> Cookie ID 1234, “c”, 1****
>
> ** **
>
> Similarly…****
>
> ** **
>
> Cookie ID 1234,
> http://www.candlesplus.com/aromacenter/vaniall.php?account_id=Wiley ****
>
> -becomes-****
>
> Cookie ID 1234, “c”, 2****
>
> ** **
>
> While difficult to predefine in technical terms, as long as the resulting
> “aggregate” doesn’t allow for reverse engineering back to the actual event,
> then tracking is not occurring.****
>
> ** **
>
> ROT13 doesn’t work (character rotation of 13 places) as this can be
> reverse engineered directly and wouldn’t be able to be contained through
> administrative and operational controls.  That’s why we’ve recommended
> something more significant such as keyed/secret hash where the key is
> further contained from access outside of automated routines – aka, humans –
> as a more reasonable option (but there could be others that meet the same
> goal).****
>
> ** **
>
> - Shane    ****
>
> ** **
>
> *From:* Jonathan Mayer [mailto:jmayer@stanford.edu]
> *Sent:* Wednesday, July 10, 2013 11:55 PM
> *To:* Shane Wiley
> *Cc:* Lauren Gelman; Peter Swire; Justin Brookman; Rob van Eijk; Mike
> O'Neill; public-tracking@w3.org
> *Subject:* Re: URLS/scoring****
>
> ** **
>
> Shane, ****
>
> ** **
>
> Could you please identify the **text** that limits these exceptions from
> "tracking"?  Once a URL is altered to something other than a plaintext URL
> (e.g. applying ROT13), why is it still "tracking"?****
>
> ** **
>
> Thanks,****
>
> Jonathan****
>
> ** **
>
> On Wednesday, July 10, 2013 at 3:34 PM, Shane Wiley wrote:****
>
>   Lauren,****
>
>  ****
>
> I’m not following your “translation from English to Spanish” example as
> for the Aggregate Scoring approach would be more akin to summarizing
> English into basic sounds – of which could be attributed to any number of
> words but in of themselves does not reveal the actual word the sound
> belongs to.****
>
>  ****
>
> - Shane****
>
>  ****
>
> *From:* Lauren Gelman [mailto:gelman@blurryedge.com<gelman@blurryedge.com>]
>
> *Sent:* Wednesday, July 10, 2013 7:47 PM
> *To:* Peter Swire
> *Cc:* Jonathan Mayer; Shane Wiley; Justin Brookman; Rob van Eijk; Mike
> O'Neill; public-tracking@w3.org
> *Subject:* Re: URLS/scoring****
>
>  ****
>
>  ****
>
> The change proposed to limit the definition of tracking to URLs is
> extraordinary.****
>
>  ****
>
> Business works this way anyway-- URLS are translated into segments and
> people are characterized using those. Segments and profiles are augmented
> and targeted to.  Not lists of URLs ****
>
>  ****
>
> I thought it was crazy a year ago when the compromise was made for DNT:1
> to permit collecting of information, in order to accommodate (IMHO broad)
> permitted uses.  If collection is permitted in order to allow the business
> to translate the URL into a segment, the exception has indeed, finally,
> swallowed the rule.  ****
>
>  ****
>
> Allowing aggregate scoring is just like translating english URLs to
> spanish and then saying the spanish ones are out of scope.  It ignores the
> fact that if you collect multiple data points about a unique identifier,
> you can eventually determine it's personal characteristics.  There's no
> reason that is limited to URLS, but applies equally to any translated
> characteristics.****
>
>  ****
>
> Lauren Gelman****
>
> @laurengelman****
>
> BlurryEdge Strategies
> 415-627-8512****
>
>  ****
>
> On Jul 10, 2013, at 11:14 AM, Peter Swire wrote:****
>
> ** **
>
> Please correct me if I'm wrong.****
>
>  ****
>
> My understanding is that "aggregate scoring" is not "tracking."****
>
>  ****
>
> It therefore does not qualify either as "de-identified" or "de-linked."
>  It is outside the scope of DNT under the DAA proposal.****
>
>  ****
>
> Peter****
>
>  ****
>
>  ****
>
>  ****
>
> Prof. Peter P. Swire****
>
> C. William O'Neill Professor of Law****
>
>                 Ohio State University****
>
> 240.994.4142****
>
> www.peterswire.net****
>
>  ****
>
> Beginning August 2013:****
>
> Nancy J. and Lawrence P. Huang Professor****
>
> Law and Ethics Program****
>
> Scheller College of Business****
>
> Georgia Institute of Technology****
>
>  ****
>
>  ****
>
> *From: *Jonathan Mayer <jmayer@stanford.edu>
> *Date: *Wednesday, July 10, 2013 12:40 PM
> *To: *Shane Wiley <wileys@yahoo-inc.com>
> *Cc: *Justin Brookman <jbrookman@cdt.org>, Rob van Eijk <rob@blaeu.com>,
> Mike O'Neill <michael.oneill@baycloud.com>, "public-tracking@w3.org" <
> public-tracking@w3.org>
> *Subject: *Re: URLS/scoring
> *Resent-From: *<public-tracking@w3.org>
> *Resent-Date: *Wednesday, July 10, 2013 12:40 PM****
>
>  ****
>
> Shane, ****
>
>  ****
>
> Could you please explain where "Aggregate Scoring" would land in the DAA
> proposal?  Is it "de-identified" data?  "Unlinked" data?****
>
>  ****
>
> Thanks,****
>
> Jonathan****
>
>  ****
>
> On Wednesday, July 10, 2013 at 9:11 AM, Shane Wiley wrote:****
>
>   Justin,****
>
>  ****
>
> It was my hope to add this as non-normative text as Aggregate Scoring is
> one example of “not tracking” and we’ve been focused on normative text at
> this point so that’s why it’s not included.****
>
>  ****
>
> - Shane****
>
>  ****
>
> *From:* Justin Brookman [mailto:jbrookman@cdt.org <jbrookman@cdt.org>]
> *Sent:* Wednesday, July 10, 2013 4:40 PM
> *To:* Rob van Eijk
> *Cc:* Mike O'Neill; Shane Wiley; public-tracking@w3.org
> *Subject:* Re: URLS/scoring****
>
>  ****
>
> I had heard the idea floated in Sunnyvale (and before) but it was only
> presented as a possibility --- in any event, scoring certainly ran counter
> to the previous requirements in the compliance standard.  Mike Zaneis's
> comments last week were the first time I thought I understood that the
> trade associations were proposing that OBA/retargeting be allowed when DNT
> is turned on.  And in any event, prior discussions are not really relevant
> --- I'm just trying to figure out concretely what is on the table as far as
> the DAA proposed DNT standard.****
>
>  ****
>
> Jack's proposed revision of the definition of tracking helped me (I think)
> to understand what is being offered, but I was just trying to flesh it out.
>  People keep referencing "scoring," but that term is neither defined nor
> used in any of the proposals.****
>
>  ****
>
> On Jul 10, 2013, at 11:33 AM, Rob van Eijk <rob@blaeu.com> wrote:****
>
>
>
> ****
>
> Justin, currently aggregated scoring happens parallel from R-Y-G, and is
> not part of the proposal. In Santa Clara Shane made it clear that all
> users, regardless of DNT will be subject to aggregated scoring. Only an
> opt-out cookie MAY prevent this collection, use and sharing.
>
> Rob****
>
> Justin Brookman <jbrookman@cdt.org> wrote:****
>
> To be clear, I do not believe that the term "aggregate scoring" appears
> either in the original DAA proposal or the amendments that Jack sent around
> yesterday.  As I currently think I understand the proposal, when DNT:1 is
> turned on, a third party may not use/retain the specific url/domain for OBA
> (or other non-permitted purposes), but they may use/retain any derived
> information about the url.****
>
>  ****
>
> So an ad network may not retain/use the fact that I visited
> zappos.com/32145 for OBA (or other non-permitted purposes) but they may
> retain/use/sell/do anything with a characterization of my unique ID as
> "interested in shopping," "interested in shoes," or "interested in the Nike
> Pro Attack in blue and green."  The unique ID could be a cookie, an email
> address, a name, or anything else.****
>
>  ****
>
> Justin Brookman
> Director, Consumer Privacy
> Center for Democracy & Technology
> tel 202.407.8812
> justin@cdt.org
> http://www.cdt.org
> @JustinBrookman
> @CenDemTech****
>
>  ****
>
> On Jul 10, 2013, at 11:15 AM, "Mike O'Neill" <michael.oneill@baycloud.com>
> wrote:****
>
>
>
> ****
>
> [Keep ID, Remove URL = Aggregate Scoring] is a null****
>
>  ****
>
> Because the individual is still profiled and their web activity can
> continue to be appended to the profile****
>
>  ****
>
>  ****
>
>  ****
>
> [Remove ID, Keep URL]  is a null****
>
>  ****
>
> Because a) PII might be in URLs.****
>
>  ****
>
>                  b) In reality ID has been replaced with an equivalent,
> though different,  ID’ so web activity can continue to be appended.****
>
>  ****
>
>  ****
>
> *From:* Shane Wiley [mailto:wileys <wileys>@yahoo-inc.com]
> *Sent:* 10 July 2013 15:42
> *To:* Mike O'Neill
> *Cc:* public-tracking@w3.org
> *Subject:* RE: issue-199****
>
>  ****
>
>  ****
>
> Mike,****
>
>  ****
>
>  ****
>
> I support verifiability but am challenged with technical mechanisms to
> allow this without breaking corporate confidentiality concerns.  This is
> why I call it out as an area for future development to help build solutions
> to this unique problem.****
>
>  ****
>
>  ****
>
> I’ve tried breaking the proposal down to the simplest form I can think
> of.  Let me know if this makes it more clear:****
>
>  ****
>
>  ****
>
> -----****
>
>  ****
>
> If Tracking = ID + URLs, then Not Tracking = ID <> URL****
>
>  ****
>
>  ****
>
> Keep ID, Remove URL = Aggregate Scoring****
>
>  ****
>
> Remove ID, Keep URL = De-Identification****
>
>  ****
>
>  ****
>
> Remove ID, Remove URL = De-Identification + De-Linking  (now out of scope
> of DNT)****
>
>  ****
>
> -----****
>
>  ****
>
>  ****
>
> - Shane****
>
>  ****
>
>  ****
>
> *From:* Mike O'Neill [mailto:michael.oneill@baycloud.com<michael.oneill@baycloud.com>
> ]
> *Sent:* Wednesday, July 10, 2013 3:10 PM
> *To:* Shane Wiley
> *Cc:* public-tracking@w3.org
> *Subject:* RE: issue-199****
>
>  ****
>
>  ****
>
> Shane,****
>
>  ****
>
>  ****
>
> I have not missed key points, and know the DAA proposals mean continued
> profiling, just think that needs to be made clear. Perhaps you could give
> an example where applying a hash to a UID would be useful.****
>
>  ****
>
>  ****
>
> There is not much difference between the retention of a profile ba! sed on
> algorithmically examining a web history and the actual web history itself.
> Both can be a basis for discrimination.****
>
>  ****
>
>  ****
>
> My point about verifiability is that without it, with only administrative
> and operation controls, there will be inevitably be demands for intrusive
> regulation, which will not be good for industry. Verifiability is in fact
> quite easy to ensure if tracking is constrained to cookies or even
> localStorage, and that is all the more reason to rule out tracking by other
> means such as fingerprinting.****
>
>  ****
>
>  ****
>
> Mike****
>
>  ****
>
>  ****
>
>  ****
>
> *From:* Shane Wiley [mailto:wileys@yahoo-inc.com <wileys@yahoo-inc.com>]
> *Sent:* 10 July 2013 14:36
> *To:* Mike O'Neill
> *Cc:* public-tracking@w3.org
> *Subject:* RE: issue-199****
>
>  ****
>
>  ****
>
> Mike,****
>
>  ****
>
>  ****
>
> Perhaps you’ve not been on the calls as I believe you’ve missed a few of
> the key points of this discussion.  I won’t be able to provide a full
> recount via email but I’ll try to hit the high points for you:****
>
>  ****
>
>  ****
>
> 1.      It’s understood obfuscation comes with some risk and will need to
> be bundled with operational and administrative controls to reach a
> reasonable confidence that data will not reverse engineered.  For example,
> data in the yellow state is not shared publically and/or with parties where
> you don’! t feel could protect the security of its composition.  While
> we’ve agreed on transparency in this area – no one has requested external
> verifiability to date which I believe would be somewhat impossible as a
> starting point.  Perhaps something to work on as a future goal (I believe
> the EFF would also be interested in innovating techniques in this area – is
> that fair Lee?).****
>
>  ****
>
> 2.      Agg! regate scoring will result in a profile.  The proposal does
> not attempt to remove this concept but instead to ensure the result doesn’t
> include a user’s historical cross-site activity.  This should not be
> confused with de-identification and instead is simply another method to
> meet the goal of “not tracking”.****
>
>  ****
>
>  ****
>
> - Shane****
>
>  ****
>
>  ****
>
> *From:* Mike O'Neill [mailto:michael.oneill@baycloud.com<michael.oneill@baycloud.com>
> ]
> *Sent:* Wednesday, July 10, 2013 2:02 PM
> *To:* Shane Wiley
> *Cc:* public-tracking@w3.org
> *Subject:* RE: issue-199****
>
>  ****
>
>  ****
>
> Shane,****
>
>  ****
>
>  ****
>
> As an example of why this “obfuscation” is pointless let it be a simple
> substitution cypher! so my UID (which happens to be “123456”) is turned
> into “987654”. If I visit a website containing a reference to adco.com that
> server recognises me because the UID contains “123456” and builds up a
> profile about me. They apply the transform to the UID and always get the
> unique value  “987654”. which is stored in the profiling dataset. When I
> visit other websites that also contain references toadco.com the same
> process is repeated and my web activity is appended to the dataset, again
> using “987654” as a key.****
>
>  ****
>
>  ****
>
> It makes no difference how complex  the UID transformation  is, as long as
> it is 1to1.****
>
>  ****
>
>  ****
>
> Under the “DAA proposal” rules there is absolutely no diminution of adco’s
> ability to profile me.****
>
>  ****
>
>  ****
>
> If another party gets hold of the dataset they can also see my profile,
> though not my original UID. If further records are shared they can be
> connected  to me by this other party because they have the same “987654”
> UID. They may not be able to connect records containing “123456” to me
> (unless they can crack the cypher or are given the key) but what would be
> the point? If they have access to those data records they can already
> profile me anyway.****
>
>  ****
>
>  ****
>
> If activity data in the dataset, collected with my consent, contains other
> PII about me, such as my name, post code, website history etc.  they should
> obfuscate that, perhaps using one way hash functions or aggregated scoring
> algorithms. Since these datasets are a valuable corporate asset you would
> expect them to be doing that anyway, but in any case that is legally
> required in the EU.****
>
>  ****
>
>  ****
>
> As the Snowden revelations have highlighted “operational and
> administrative controls” need to be closely monitored. In the case of
> security services this can be (has to be) through impeccable judicial
> process under democratic oversight. This would not be appropriate for
> commercial companies in a competitive environment, so transparent technical
> procedures are necessary.****
>
>  ****
>
>  ****
>
> The “yellow” state should be recognisable to users and others though
> inspection of user agent data or web logs.****
>
>  ****
>
>  ****
>
> Mike****
>
>  ****
>
>  ****
>
>  ****
>
> *From:* Shane Wiley [mailto:wileys@yahoo-inc.com <wileys@yahoo-inc.com>]
> *Sent:* 10 July 2013 12:14
> *To:* Mike O'Neill
> *Cc:* public-tracking@w3.org
> *Subject:* RE: issue-199****
>
>  ****
>
>  ****
>
> Mike,****
>
>  ****
>
>  ****
>
> I respectfully disagree.  Obfuscating the ID breaks the association with
> the actual user/device.  That said, I agree this has the risk of being
> reversed so a blend of technical, operational, and administrative controls
> must be brought to bear to keep this from occurring.****
>
>  ****
>
>  ****
>
> De-identification doesn’t allow for profiling in a manner that could
> affect a user’s experience (no way to get back to the user). ****
>
>  ****
>
>  ****
>
> Do Not Track can be achieved by breaking the link between a unique ID a!
> nd cross-site activity (URLs) – and this could result in a profile of the
> user’s interest resulting from aggregate scoring – but this would not allow
> a user’s historical activity to be retrieved.****
>
>  ****
>
>  ****
>
> - Shane****
>
>  ****
>
>  ****
>
> *From:* Mike O'Neill [mailto:michael.oneill@baycloud.com<michael.oneill@baycloud.com>
> ]
> *Sent:* Wednesday, July 10, 2013 11:55 AM
> *To:* Shane Wiley
> *Cc:* public-tracking@w3.org
> *Subject:* RE: issue-199****
>
>  ****
>
>  ****
>
> Hi Shane,****
>
>  ****
>
>  ****
>
> How can it be possible to remove the association between a device and a
> UID other than deleting it or ensuring it is deleted by the UA after a
> short duration. If the UID is there (and present in every tran! sport level
> request if it is in a cookie) it uniquely points to the device where it is
> stored or derived. This identity is available to the receiving server as
> well as any actor with similar access to the data stream or the same
> document origin.****
>
>  ****
>
>  ****
>
> If you transform the UID in retained data by setting it to another UID
> (say by using a hash function), this does not break the association because
> there is a 1to1 mapping. There is no practical point in doing it.****
>
>  ****
>
>  ****
>
> De-identified data can only be classed as such if there is no linkage. The
> “yellow” state can be imagined as an intermediate stage before
> de-identification but is only relevant for permitted uses (such as the
> detection of unique visitors for analytics or frequency capping), and there
> is no need for it to exist for more than a few hours.****
>
>  ****
>
>  ****
>
> If we end up defining de-identified as including the ability to link
> individuals to a profile it would be a travesty, and people will see
> through it. The arms race has already started with an explosion of blunt
> cookie and script blockers. If there is not a sensible response to people’s
> real privacy concerns the usefulness of the web (and consequently the
> profitability of many business models) will be severely diminished.****
>
>  ****
>
>  ****
>
> Mike****
>
>  ****
>
>  ****
>
>  ****
>
> *From:* Shane Wiley [mailto:wileys@yahoo-inc.com <wileys@yahoo-inc.com>]
> *Sent:* 09 July 2013 19:30
> *To:* Mike O'Neill; 'achapell'; npdoty@w3.org; tlr@w3.org
> *Cc:* public-tracking@w3.org; jeff@democraticmedia.org
> *Subject:* RE: issue-199****
>
>  ****
>
>  ****
>
> Mike,****
>
>  ****
>
>  ****
>
> Deidentification is about removing the association between a unique ID
> (any source:  cookie, digital fingerprint, etc.) and the actual/specific
> user/device.  In this context:****
>
>  ****
>
>  ****
>
> Red:  actual user/device****
>
>  ****
>
> Yellow:  not actual user/device but events are linkable (and only usable
> for analytics/reporting)****
>
>  ****
>
> Green:  not actual user/device and events are not linkable (outside the
> scope of DNT)****
>
>  ****
>
>  ****
>
> - Shane****
>
>  ****
>
>  ****
>
> *From:* Mike O'Neill [mailto:michael.oneill@baycloud.com<michael.oneill@baycloud.com>
> ]
> *Sent:* Sunday, June 30, 2013 3:01 PM
> *To:* 'achapell'; npdoty@w3.org; tlr@w3.org
> *Cc:* public-tracking@w3.org; jeff@democraticmedia.org
> *Subject:* RE: issue-199****
>
>  ****
>
>  ****
>
> Alan,****
>
>  ****
>
>  ****
>
> Persistent identifiers and their duration should be discussed as part of
> the red/yellow/green permitted use debate. Browser fingerprinting
> identifiers are qualitatively different from those stored in cookies or
> localStorage because they are effectively infinite in duration, so I
> thought it best to extend the defs. to make that clear.****
>
>  ****
>
>  ****
>
>  ****
>
> Mike****
>
>  ****
>
>  ****
>
>  ****
>
> *From:* achapell [mailto:achapell@chapellassociates.com<achapell@chapellassociates.com>
> ]
> *Sent:* 30 June 2013 22:39
> *To:* michael.oneill@baycloud.com; npdoty@w3.org; tlr@w3.org
> *Cc:* public-tracking@w3.org; jeff@democraticmedia.org
> *Subject:* RE: issue-199****
>
>  ****
>
>  ****
>
> Do we want to specify technologies here?  ****
>
>  ****
>
>  ****
>
>  ****
>
> Cheers,
>
> Alan Chapell
> 917 318 8440****
>
>  ****
>
>
>
>
> -------- Original message --------
> From: Mike O'Neill <michael.oneill@baycloud.com>
> Date: 06/30/2013 3:33 PM (GMT-05:00)
> To: Nicholas Doty <npdoty@w3.org>,tlr@w3.org
> Cc: public-tracking@w3.org,jeff@democraticmedia.org
> Subject: issue-199****
>
>  ****
>
> Nick, Thomas****
>
>  ****
>
> Dr Dix’s letter reminded me that we need to have some reference to browser
> fingerprinting being ruled out when DNT is set. I have amended the
> definitions accordingly.****
>
>  ****
>
> Do you want me to modify the wiki?****
>
>  ****
>
>  ****
>
>  ****
>
> A *persistent identifier* is an arbitrary value held in, or derived from
> o! ther data in, the user agent whose purpose is to identify the user agent
> in subsequent transactions to a particular web domain. It may be encoded
> for example as the name or value attribute of an HTTP cookie, as an item in
> localStorage or recorded in some way in the cache.****
>
>  ****
>
> The *duration* of a persistent identifier is the maximum period of time
> it will be retained in the user agent. This could be implemented for
> example using the Expires or Max-Age attributes of an HTTP cookie so that
> it is automatically deleted by the user agent after the specified time
> period is exceeded.****
>
>  ****
>
> *Browser* *fingerprinting*!  is a method of tracking based on creating a
> persistent identifier from other information either inherent in the content
> request or already stored in the user agent. Such an identifier may not
> need itself to be stored in the user-agent as it can be calculated again in
> subsequent transactions. It follows from this that its duration is
> effectively unlimited.****
>
>  ****
>
> *Justification.*****
>
>  ****
>
> *With the duration definition, restrictions on permitted uses could then
> be made that limit the duration of persistent identifiers.* *Because* *browser
> fingerprinting* *cannot! be given a finite duration this tracking method
> should not be used when DNT is set even if it is for a permitted use.* *In
> reality browser fingerprinting solely based on examining initial content
> requests is usually not an effective tracking method because the
> combination of IP addresses and other headers are not sufficiently user
> specific, but we should rule out at least the more complex form when DNT is
> set.*****
>
>  ****
>
> Mike****
>
>  ****
>
>  ****
>
>  ****
>
>   ****
>
>  ****
>
>  ** **
>



-- 
Edward W. Felten
Professor of Computer Science and Public Affairs
Director, Center for Information Technology Policy
Princeton University
609-258-5906           http://www.cs.princeton.edu/~felten

Received on Thursday, 11 July 2013 01:03:40 UTC