- From: Roy T. Fielding <fielding@gbiv.com>
- Date: Wed, 23 May 2012 13:39:44 -0700
- To: "public-tracking@w3.org Group WG" <public-tracking@w3.org>
This message is to complete ACTION-166, related to ISSUE-16. "I have a large seashell collection which I keep scattered on the beaches all over the world... maybe you've seen it." -- Steven Wright [1] As I've mentioned a few too many times, I disagree with the current text in the compliance document that defines collection as http://www.w3.org/TR/2012/WD-tracking-compliance-20120313/#crus • A party "collects" data if the data comes within its control. The regulatory bodies use the term "data collection" extensively without actually defining it. As near as I can tell, they rely on the commonly established usage of the term for statistical surveys, something which the government is very familiar with (census) and frequently regulates. http://stats.oecd.org/glossary/detail.asp?ID=534 "Data collection is the process of gathering data." and "gathering" is itself defined as collection (circular) or as the act of assembling a group of things together in one place. I think we all should understand that collection implies gathering together and at least some form of retention. The above joke by Steven Wright depends on the audience knowing that. We can collect seashells by taking them off the beach, not by merely walking by them. We can collect photos of seashells by taking each one's picture and retaining that picture, not by snapping the shot and then deleting it from memory. We can collect on behalf of others without personal retention (i.e., the response from the commissioner in DC about sharing being part of the definition). As a technical matter, assigning a pseudo-ID to a user agent via a cookie that is derived from a random source, perhaps combined with codes for algorithmic validation, and merely receiving that cookie in later requests, is not by itself data collection. Data collection would be retaining the cookie value along with the request data or gathering the data from multiple requests over time (browser activity) in a way that can be traced back to that cookie. This distinction is important because there are many uses of cookies that are not for the purpose of tracking, even though the value is unique per user agent. Some of those uses are more important than DNT compliance. There is nothing we can put in the compliance document that will cause those uses to disappear. What we can do, however, is lay out constraints on retention of the cookie value at the server (e.g., not beyond the scope of handling a single request nor shared with any other party) and on data collection associated via the identifier (e.g., only retain data that is not personally identifiable, silo data by first-party, do not allow the identifier or its derivatives to tie any sort of activity trail across multiple sites, etc.). I know that this does not satisfy the request by EFF that unique identifiers be banned for the sake of easier discovery of bad actors that claim to comply with DNT but then track anyway. Sorry. I do not consider verification by non-regulators to be a requirement for DNT to succeed, and the point is moot given the exception for fraud control. The compliance document should define "Data collection" (for the purpose of DNT) is the process of assembling data from or about one or more network interactions and retaining/sharing that data beyond the scope of responding to the current request or in a form that remains linkable to a specific user, user agent, or device. Cheers, Roy T. Fielding <http://roy.gbiv.com/> Principal Scientist, Adobe Systems <http://adobe.com/enterprise> [1] http://www.funnyordie.com/videos/a024670721/steven-wright-standup-from-standupfan (warning: [1] is a massively tracked resource that might get you in trouble at work if your job description doesn't include watching funnyordie as "research")
Received on Wednesday, 23 May 2012 20:40:10 UTC