W3C home > Mailing lists > Public > public-tracking@w3.org > May 2012

Re: ISSUE-16, ACTION-166: define (data) collection

From: Roy T. Fielding <fielding@gbiv.com>
Date: Wed, 23 May 2012 18:42:11 -0700
Cc: "public-tracking@w3.org Group WG" <public-tracking@w3.org>
Message-Id: <92D99138-96B0-4232-9D60-CDDFCE03B3B7@gbiv.com>
To: Bjoern Hoehrmann <derhoermi@gmx.net>
On May 23, 2012, at 3:40 PM, Bjoern Hoehrmann wrote:

> * Roy T. Fielding wrote:
>> I think we all should understand that collection implies gathering
>> together and at least some form of retention.  The above joke by
>> Steven Wright depends on the audience knowing that.  We can collect
>> seashells by taking them off the beach, not by merely walking by them.
>> We can collect photos of seashells by taking each one's picture
>> and retaining that picture, not by snapping the shot and then
>> deleting it from memory.
> 
> I do agree there is an element of "retention" in "collection", but your
> interpretation seems to imply that you can do certain things with data
> even though you have never collected that data, and I think some people
> would find that contradictory.

Well, they'd be wrong.  Just consider how much data is
processed in HTTP header fields without anyone even
bothering to log it.  Use does not imply collection.

> The joke depends on the idea that all the seashells on the beaches on
> the planet have come under the control of Steven Wright at some point
> who then put them roughly where they are today. That is a surprising
> idea if you usually assume that no human being could or would do that.

I didn't mean to imply that he has to move the shells
in order to collect them -- people have art collections that
are physically located in museums around the world.  The joke
is a little more subtle than that.  Perhaps we shouldn't
be mixing nouns with adjectives.

> If you are at some beach and pick up a seashell and then throw it at a
> specific location, are you collection seashells in that place?

It depends on what time scale we are talking about and how
long that place (or that shell) remains under your control.

> What if you throw them across a border you cannot cross?

If the border is a black hole, no.  If someone on the other
side has asked you to toss them into their control, then yes.

> What if you throw the shells into a bucket filled with hydrochloric acid?

No.

> What if you are a magician, ask people to give you seashells, put them somewhere, and if people look at where you apparently put them, they are not there, so,
> did you actually collect the shells?

Magicians don't need exemptions.

> Your analogies suffer from a number of problems, if you walk past the
> seashells they do not actually come under your control.

Yes, they do come under my control (and within my control,
as phrased by the current document.  I grew up on a beach,
so I am quite familiar with the concept.

> And photos of them, well, you are presenting a white box example. When a stranger
> follows you around taking photos of you, you might worry that they are
> collecting photos of you, and would still do so if you confront them
> and they say, oh, the camera deletes all the pictures from memory.

Yes, you might worry that they are collecting them, which is why
you confront them and delete the pictures from memory -- to assuage
that data collection worry.  It doesn't imply that the person is
performing data collection on you -- they might very well be
taking non-identifiable pictures of a bug that landed on your
backside.  I'd still worry, but for other reasons.

> When you visit a web page and a script on it determines the resolution
> of your screen or your timezone or whatever, and sends that information
> to some server, I would say someone or something is collecting that in-
> formation,

Yes, the script is collecting it from the scripting environment
and then passing the data to another outside that environment.

> even if it does not last long on the server.

If the server only uses it in responding to the request in
which it appears, then the server has not collected the data.
It has certainly used it.  The owner of the web page is still
responsible for the data collection by the script, and whatever
happens as a result of sharing the data, but that's independent
of how we describe what happens once the data reaches the server.

> They gather it in one place, on that server. I would think if some web service says it
> does not collect information on user's screen resolution, but a script
> quite obviously obtains such information from the browser and sends it
> back to the service, people would feel mislead.

Some people might not understand collection, in that context,
would imply the server will save the user screen resolution for
later use.  There is no additional privacy implication to using
data that has already been provided by the user.
There is privacy concern for obtaining data from the user's
device that it has not already agreed to share, so it is the
script's data collection that matters; how that might be
successfully communicated to a user is far more complicated
than just "data collection".

I can collect flowers for myself.
I can collect flowers and then give them to someone else.
If I just pick flowers and throw them away while walking down
the street, I have not collected them.

> (Consider the same point for information that is not usually used to
> adapt content, like which web pages you have recently visited or which
> fonts you have installed; would it be wrong to accume someone that they
> are "collecting" this information if their web pages obtain this and
> also transmit it back to the server, if there is no particular reason
> to do so for content adaption purposes?)

Umm, both of those are used to adapt content.  If that information is
sent out of the private context of the browser and to the server,
then it has been collected.  If it is merely used on the client to
select from various alternatives, it has not been collected, though
care must be taken not to expose the information accidentally in
later requests.

Likewise, a cookie set by a server and then received by that same
server is not data collection -- it's just use.  Correlating
browser activity over time is data collection, whether or not
it is made easier by use of a unique identifier.  Retention of
activity across multiple websites by virtue of a shared
identifier is also data collection (of the shared activity).

I think all of that is covered by my definition.  Maybe you could
respond to that instead of just responding to the analogy.

....Roy
Received on Thursday, 24 May 2012 01:42:37 UTC

This archive was generated by hypermail 2.3.1 : Friday, 21 June 2013 10:11:28 UTC