RE: Issue-5, trying to find a middle ground

Hi David,

I think this is a good compromise, and the association of visited Url and
user/device is the important thing. Maybe have "derived from a given
request" rather than "of a", but that is editorial. I agree retention is
more crucial than collection, though we should have some non-normative text
somewhere about suspicious collection procedures e.g. fingerprinting or
cookie ids being ruled out (because they could be perceived as evidence of
intent to retain).

Mike


-----Original Message-----
From: David Singer [mailto:singer@apple.com] 
Sent: 11 October 2013 23:44
To: public-tracking@w3.org (public-tracking@w3.org)
Subject: Issue-5, trying to find a middle ground

Looking at the change proposals at
http://www.w3.org/wiki/Privacy/TPWG/Change_Proposal_Tracking_Definition, I
tried to find the key ideas and points in them.  Since there is at least one
significant point of difference, I have worked this as a formal definition,
but of course we could say

"In rough terms, tracking is ." "The precise definition is effectively the
effect of the rules that this document defines for parties that conform to
this recommendation."


Key Ideas

1)     Roy's definition (1) correctly uses a word other than site; we don't
mind data flowing within the context of a single controller.
2)     I think Roy's definition (1) is saying that it's the connecting of
the user with one or more ('multiple') contexts other than the context that
received the transaction that's a problem. This is roughly what I previously
described as 'tunnel vision': connecting the user with any other context
than the recipient. This might be OK, and it certainly solves a
long-standing problem with my (3) - 'normal' logs get pulled into the
dragnet in (3), which is unpleasant, and maybe unworkable.
3)     My old definition (3) intends a network interaction to be the HTTP
request/response; the current draft has it as a 'page', and servers don't
know about pages, and anyway, what is supposed to happen if the DNT signal
is inconsistent on the various requests for the parts of a page?  We need a
decent definition of 'retention/retain' and of 'network interaction':
'network interaction' or 'network transaction' is an HTTP request and its
response, and 'retain/retention' is holding data after the 'network
transaction' is complete.
4)     Rob's definition doesn't have the carve-out for responding to the
transaction; however, he seems to be arguing in the non-normative text that
even for that the site should not gather extra data about the user. I am not
sure this is tenable; converting an IP address into a location and hence
into a time of day would be gathering extra data, and we may want to allow
that (we probably do).
5)     Roy's definition of context in (1) seems to be what we should define
as a 'party' (see separate issue and discussion).
6)     We don't want data shared around even in-transaction, so we need a
definition of sharing that simply says that the data crosses contexts.
 
[My previous tunnel-vision was described in
http://lists.w3.org/Archives/Public/public-tracking/2012Jan/0227.html.]

Existing definitions:

This has a serious problem, which makes the subsequent one unmanageable:
 
"A network interaction is the set of HTTP requests and responses, or any
other sequence of logically related network traffic caused by a user visit
to a single web page or similar single action. Page re-loads, navigation,
and refreshing of content cause a new network interaction to commence."
 
A server has no idea what a page is; it gets requests for resources and
responds.  In particular it has no idea when a page load is complete, so the
termination of this is indeterminate:
"A party retains data if data remains within a party's control beyond the
scope of the current network interaction."


Suggestion:

If I read it right, we have a choice here, between (in colloquial terms)
a)     stop remembering data about me
b)     stop remembering where I have been on the web (apart from you)
Since the first has obvious practical issues, let's see if we can settle on
the second (which has obvious 'slightly tracking' issues).
 
New/improved definitions:
 
'Network Transaction': an HTTP request and its matching response, or the
equivalent in another protocol;
'Context': a set of resources that share the same data controller and a
common branding [but this probably should become the definition of 'party']
'Retain/retention': data is retained if it is held after a Network
Transaction is complete
'Share': data is shared if it is passed by one Context to another Context
 
[I don't think we need, for this definition:
'Collect': data is collected if it was not present in the transaction but is
subsequently retrieved and associated with it
'Use':  if you have not retained the data, you can't use it, and we need to
allow the data to be used to service the transaction in which it occurs]

So, here we go:
 
Tracking is the Retention or Sharing of data, of a given request, that can
associate (A) a Context other than the context that received the request,
with (B) a particular user, user agent, or device (maybe 'the user, user
agent, or device that made the request'?).
 
 
Note, that this does allow some retention of personal data. Under this rule,
a site can keep ordinary log files that include such normal fields as IP
address, user-agent, and so on. Frequency capping is also possible; as long
as you remember only data that associates the user with the ads you served,
you are fine. You cannot associate the user with other sites, however,
notably first party sites.

As I say, we could bracket this with "in rough terms."

David Singer
Multimedia and Software Standards, Apple Inc.

Received on Saturday, 12 October 2013 11:42:18 UTC