Re: [Bug 167] explanation of identified, identifiable, and linked from Rigo Wenning on 2003-08-20 (public-p3p-spec@w3.org from August 2003)

From: Rigo Wenning <rigo@w3.org>
Date: Wed, 20 Aug 2003 16:47:27 +0200
To: Ari Schwartz <ari@cdt.org>
Cc: public-p3p-spec@w3.org
Message-ID: <20030820144727.GE1270@rigo.w3.org>

Ari, 

it is much better than the first draft. I like the introduction. There
are still some mixings in the text, so see comments inline..

On Mon, Aug 18, 2003 at 01:38:06PM -0400, Ari Schwartz wrote:
> "Identified" Data
> 
> The most common term in the specification is "identified data" and 
> focuses on how the information can be or is being used.

The phrase above is the kind of confusion that we wanted to avoid.
"identified" is a property of data, not a 'use' or what people intend to
do with it. The term 'use' belongs to the <purpose> - section of the
specification where we don't use the term identified. So I suggest we
take the lesson from the UA TF and alter the phrase to:

he most common term in the specification is "identified data" and
focuses on whether a service knows your identity. 
> 
> "Identified data" is information that reasonably can be used by the 
> data collector to identify an individual.  

information that reasonably can be used by the datacollector 
to identify an individual is "identifiable data". Identified means that
the identification already happened. This especially important if you
remember the worries about somebody coming up and wanting access to all
the info on him in the logs of a large portal site. We decided to say
that if information is well-organized and they know you, they should
give access. But we did not want to express access to some plain
log-info, where the pseudonyms (IP e.g) were not resolved.

Suggestion: 

"Identified data" is information in a record or profile and already tied
to an individual. 

> Admittedly, this is a  somewhat subjective standard.  

> For example, a data collector storing 
> Internet Protocol (IP) addresses  (which can be created dynamically 
> or could be static and therefore tied to a particular computer used 
> by a single individual) should consider the IP address "identified 
> data" only when an attempt is made to tie the exact addresses to past 
> records or work with others to identify the specific individual or 
> computer over a long period of time.  

Ari, the attempt can happen anytime. If I use the identifiable data via
additional processing to identify an individual, this is purpose of
identification. The collection happened earlier and access has nothing
to do with it. So I'm a bit unhappy with your example. 

Suggestion (depending on my definition above)

For example, a data collector storing Internet Protocol (IP) addresses
(which can be created dynamically or could be static and therefore tied
to a particular computer used by a single individual) should consider
the IP address "identified data" only when this data is added to the
record or profile of a specific individual. 

> In the more common case, where 
> data collectors use IP addressing information in the aggregate or 
> make no attempt to tie the IP address to a specified individual or 
> computer over a long period of time, IP addresses are not considered 
> identified even though it is possible for someone (eg, law 
> enforcement agents with proper subpoena powers) to identify the 
> individual based on the stored data.

The rest is ok, good work

Rigo

Received on Thursday, 21 August 2003 16:25:50 UTC