W3C home > Mailing lists > Public > public-tracking@w3.org > May 2012

ISSUE-16, ACTION-166: define (data) collection

From: Roy T. Fielding <fielding@gbiv.com>
Date: Wed, 23 May 2012 13:39:44 -0700
Message-Id: <B824FBBE-003F-4F19-8892-826FF4715543@gbiv.com>
To: "public-tracking@w3.org Group WG" <public-tracking@w3.org>
This message is to complete ACTION-166, related to ISSUE-16.

   "I have a large seashell collection which I keep scattered
    on the beaches all over the world... maybe you've seen it."
    -- Steven Wright [1]

As I've mentioned a few too many times, I disagree with the current
text in the compliance document that defines collection as


     A party "collects" data if the data comes within its control.

The regulatory bodies use the term "data collection" extensively
without actually defining it.  As near as I can tell, they rely on
the commonly established usage of the term for statistical surveys,
something which the government is very familiar with (census) and
frequently regulates.


   "Data collection is the process of gathering data."

and "gathering" is itself defined as collection (circular) or
as the act of assembling a group of things together in one

I think we all should understand that collection implies gathering
together and at least some form of retention.  The above joke by
Steven Wright depends on the audience knowing that.  We can collect
seashells by taking them off the beach, not by merely walking by them.
We can collect photos of seashells by taking each one's picture
and retaining that picture, not by snapping the shot and then
deleting it from memory.  We can collect on behalf of others without
personal retention (i.e., the response from the commissioner in DC
about sharing being part of the definition).

As a technical matter, assigning a pseudo-ID to a user agent via a
cookie that is derived from a random source, perhaps combined with
codes for algorithmic validation, and merely receiving that cookie
in later requests, is not by itself data collection.

Data collection would be retaining the cookie value along with the
request data or gathering the data from multiple requests over time
(browser activity) in a way that can be traced back to that cookie.

This distinction is important because there are many uses of
cookies that are not for the purpose of tracking, even though
the value is unique per user agent.  Some of those uses are
more important than DNT compliance.  There is nothing we can
put in the compliance document that will cause those uses to

What we can do, however, is lay out constraints on retention
of the cookie value at the server (e.g., not beyond the scope
of handling a single request nor shared with any other party)
and on data collection associated via the identifier (e.g.,
only retain data that is not personally identifiable, silo data
by first-party, do not allow the identifier or its derivatives
to tie any sort of activity trail across multiple sites, etc.).

I know that this does not satisfy the request by EFF that
unique identifiers be banned for the sake of easier discovery
of bad actors that claim to comply with DNT but then track anyway.
Sorry.  I do not consider verification by non-regulators to be
a requirement for DNT to succeed, and the point is moot given
the exception for fraud control.

The compliance document should define

   "Data collection" (for the purpose of DNT) is the process of
    assembling data from or about one or more network interactions
    and retaining/sharing that data beyond the scope of responding
    to the current request or in a form that remains linkable to a
    specific user, user agent, or device.


Roy T. Fielding                     <http://roy.gbiv.com/>
Principal Scientist, Adobe Systems  <http://adobe.com/enterprise>

[1] http://www.funnyordie.com/videos/a024670721/steven-wright-standup-from-standupfan

  (warning: [1] is a massively tracked resource that
   might get you in trouble at work if your job description
   doesn't include watching funnyordie as "research")
Received on Wednesday, 23 May 2012 20:40:10 UTC

This archive was generated by hypermail 2.3.1 : Friday, 3 November 2017 21:44:48 UTC