W3C home > Mailing lists > Public > public-tracking@w3.org > May 2012

Re: ISSUE-16, ACTION-166: define (data) collection

From: Sean Harvey <sharvey@google.com>
Date: Wed, 23 May 2012 16:57:27 -0400
Message-ID: <CAFy-vucvvk8oTZK2HbnsVJnpbfcjMDTjv37YTnNFxBzY0boH2Q@mail.gmail.com>
To: "Roy T. Fielding" <fielding@gbiv.com>
Cc: "public-tracking@w3.org Group WG" <public-tracking@w3.org>
Thanks Roy I really appreciate your putting this together. Prior to this we
had been working under an arbitrary nomenclature division that was
idiosyncratic to this working group and its documents, with "collection"
equaling touching the web server (unavoidable in all cases) and "retention"
was a term more in line with your definition of "collection".

I'm curious to understand what you view as the real world implications of
this. Why in your view is it important that we define collection in this
more traditional fashion? What problems could crop up if we kept the
current definitions & nomenclature? Is this about misinterpretation by
regulatory bodies? Are there other potential issues?

sean




On Wed, May 23, 2012 at 4:39 PM, Roy T. Fielding <fielding@gbiv.com> wrote:

> This message is to complete ACTION-166, related to ISSUE-16.
>
>   "I have a large seashell collection which I keep scattered
>    on the beaches all over the world... maybe you've seen it."
>    -- Steven Wright [1]
>
> As I've mentioned a few too many times, I disagree with the current
> text in the compliance document that defines collection as
>
>    http://www.w3.org/TR/2012/WD-tracking-compliance-20120313/#crus
>
>     A party "collects" data if the data comes within its control.
>
> The regulatory bodies use the term "data collection" extensively
> without actually defining it.  As near as I can tell, they rely on
> the commonly established usage of the term for statistical surveys,
> something which the government is very familiar with (census) and
> frequently regulates.
>
>   http://stats.oecd.org/glossary/detail.asp?ID=534
>
>   "Data collection is the process of gathering data."
>
> and "gathering" is itself defined as collection (circular) or
> as the act of assembling a group of things together in one
> place.
>
> I think we all should understand that collection implies gathering
> together and at least some form of retention.  The above joke by
> Steven Wright depends on the audience knowing that.  We can collect
> seashells by taking them off the beach, not by merely walking by them.
> We can collect photos of seashells by taking each one's picture
> and retaining that picture, not by snapping the shot and then
> deleting it from memory.  We can collect on behalf of others without
> personal retention (i.e., the response from the commissioner in DC
> about sharing being part of the definition).
>
> As a technical matter, assigning a pseudo-ID to a user agent via a
> cookie that is derived from a random source, perhaps combined with
> codes for algorithmic validation, and merely receiving that cookie
> in later requests, is not by itself data collection.
>
> Data collection would be retaining the cookie value along with the
> request data or gathering the data from multiple requests over time
> (browser activity) in a way that can be traced back to that cookie.
>
> This distinction is important because there are many uses of
> cookies that are not for the purpose of tracking, even though
> the value is unique per user agent.  Some of those uses are
> more important than DNT compliance.  There is nothing we can
> put in the compliance document that will cause those uses to
> disappear.
>
> What we can do, however, is lay out constraints on retention
> of the cookie value at the server (e.g., not beyond the scope
> of handling a single request nor shared with any other party)
> and on data collection associated via the identifier (e.g.,
> only retain data that is not personally identifiable, silo data
> by first-party, do not allow the identifier or its derivatives
> to tie any sort of activity trail across multiple sites, etc.).
>
> I know that this does not satisfy the request by EFF that
> unique identifiers be banned for the sake of easier discovery
> of bad actors that claim to comply with DNT but then track anyway.
> Sorry.  I do not consider verification by non-regulators to be
> a requirement for DNT to succeed, and the point is moot given
> the exception for fraud control.
>
> The compliance document should define
>
>   "Data collection" (for the purpose of DNT) is the process of
>    assembling data from or about one or more network interactions
>    and retaining/sharing that data beyond the scope of responding
>    to the current request or in a form that remains linkable to a
>    specific user, user agent, or device.
>
>
> Cheers,
>
> Roy T. Fielding                     <http://roy.gbiv.com/>
> Principal Scientist, Adobe Systems  <http://adobe.com/enterprise>
>
>
> [1]
> http://www.funnyordie.com/videos/a024670721/steven-wright-standup-from-standupfan
>
>  (warning: [1] is a massively tracked resource that
>   might get you in trouble at work if your job description
>   doesn't include watching funnyordie as "research")
>
>
>


-- 
Sean Harvey
Business Product Manager
Google, Inc.
212-381-5330
sharvey@google.com
Received on Wednesday, 23 May 2012 20:57:57 UTC

This archive was generated by hypermail 2.3.1 : Friday, 21 June 2013 10:11:28 UTC