tracking data (was Re: [TCS] comments on 17 Feb 2015 editors draft)

Hi Roy, Mike,

>>  2.11  Tracking
>> 
>>   Tracking is the collection of data regarding a particular user's activity
>>   across multiple distinct contexts and the retention, use, or sharing of
>>   data derived from that activity outside the context in which it occurred.
>>   A context is a set of resources that are controlled by the same party or
>>   jointly controlled by a set of parties.
>> 
>>   Tracking data is any data that could be combined with other data to engage
>>   in tracking a user across different contexts.

From Roy:

> Wait, that's new.  Tracking data is already implicitly defined by the
> first paragraph, above, to be the data collected when tracking.  I am pretty
> sure that is how we use it, as well, so we don't need another definition.
> The above definition changes it to mean the data used to enable tracking,
> which isn't at all like we are using it.
> 
> In any case, the definition is incorrect because any data
> "could be combined with other" (tracking) data to make more tracking data,
> which implies that all data is tracking data.

I don't think a regular reading of this definition would imply that any data is tracking data. The number 42, when combined with data about Nick Doty's browsing activity on many sites, does not allow one to engage in tracking me across different contexts; it's the data on my browsing activity that's used to engage in tracking. I believe no permanently de-identified data would qualify.

Similarly, our definition of "permanently de-identified" includes a clause about identifying a user "alone or in combination with other retained or available information"; I don't believe that's interpreted to mean that permanently de-identified data cannot exist.

> If we need an explicit definition, it could be something like
> 
>  "Tracking data is any data collected or derived as a result of tracking
>   that would not have been known without tracking."

When we discussed variations for issue-203 from September onward, the suggestion of using "tracking data" was to limit the scope of compliance requirements but also to note that sharing data from one context (itself not tracking) isn't compliant with a user's preference where it enables tracking by someone else. That is, in some cases we want to limit the sharing of data that enables tracking even if the collection of any particular datum isn't tracking.

From Mike:

> I do not agree with Roy here about this being redundant. This definition is important because it is used in de-identification and clarifying examples. It is not simply "data collected when tracking" because it is referring to the specific data used for linking, as was discussed when we talked about Gateways, i.e. raw UIDs associated with other data e.g. URLs is tracking data.
> 
> It could be better expressed. How about:
> 
>> Tracking data is any data that enables a user agent to be recognised across different contexts when combined with other data collected in those contexts. Examples include cookie UIDs or source IP addresses when collected together with targeted URLs.

I think that's Mike's concern with Roy's text: that if "tracking data" is only data that comes from tracking, then there would not be any limits on sharing personally identifiable data that only refers to browsing activity in a single context.

One resolution would be to put limits on sharing of any data from a network interaction that isn't permanently de-identified (rather than on "tracking data"). That would be added to the third-party compliance section, and be similar to the sentence we have in first-party compliance about sharing data that a party would be prohibited from collecting. In that case, we could delete the above definition of "tracking data" and instead refer to "data not permanently de-identified" (in the server compliance sections) or just "data" in the de-identification definition.

Thoughts?

Thanks,
Nick

Received on Wednesday, 25 March 2015 01:55:34 UTC