[Bug 167] explanation of identified, identifiable, and linked from Ari Schwartz on 2003-06-20 (public-p3p-spec@w3.org from June 2003)

From: Ari Schwartz <ari@cdt.org>
Date: Fri, 20 Jun 2003 12:09:27 -0400
To: public-p3p-spec@w3.org
Message-Id: <a05200f0abb18e189aa43@[10.0.1.7]>
Here is my draft text for addressing the confusion around identity 
terms in the spec.



>http://www.w3.org/Bugs/Public/show_bug.cgi?id=167



Identity Definitions in the P3P Specification

In privacy regulations, guidelines and papers about privacy a variety 
of terms are used to describe data that identifies an individual to 
varying degrees.  Some common terms such as "personally identifiable 
information (PII)" are often not defined or the cause for heated 
debate.  In different documents, "identity" can be tied to:

1) how the information can be or is being used,
2) how the information is stored, or
3) the type of information.

The P3P Specification Working Group tried to capture all three of 
these ideas so that different implementers and users can make 
decisions based on the importance they place on these various 
definitions of identity. (1)

Identity Through Usage ("identified" data)

The most common term in the specification is "identified data" and 
focuses on how the information can be or is being used.

"Identified data" is information that reasonably can be used by the 
data collector to identify an individual.  Admittedly, this is a 
somewhat subjective standard.  For example, a data collector storing 
Internet Protocol (IP) addresses  (which can be created dynamically 
or could be static and therefore tied to a particular computer used 
by a single individual) should consider the IP address "identified 
data" only when an attempt is made to tie the exact addresses to past 
records or work with others to identify the specific individual or 
computer over a long period of time.  In the more common case, where 
data collectors use IP addressing information in the aggregate or 
make no attempt to tie the IP address to a specified individual or 
computer over a long period of time, IP addresses are not considered 
identified even though it is possible for someone (eg, law 
enforcement agents with proper subpoena powers) to identify the 
individual based on the stored data.


Identity Through Storage ("non-identifiable" and "linked" data)

The working group also felt that data collectors should be able 
acknowledge when they make specific attempts to anonymize what would 
otherwise be identifiable in its storage.

The term "non-identifiable" data refers to how the information is 
stored.  For example, a data collector collecting and storing IP 
addresses but not using them should NOT call this data 
"non-identifiable" even in the common case where they have no plans 
to identify an actual individual or computer. However, if a Web site 
collects IP addresses, but actively deletes all but the last four 
digits of this information in order to determine short term use, but 
insure that a particular individual or computer cannot be 
consistently identified, then the data collector can and should call 
this information "non-identifiable."  Also, non-identifiable can be 
used in cases where no information is being collected at all.  Since 
most Web servers are designed to keep Web logs for maintenance, this 
would most likely mean that the data collector has taken specific 
efforts to ensure the anonymity of users.

Under the above definitions, a lot of information could be 
"identifiable" (not specifically made anonymous), but not 
"identified" (reasonably able to be tied to an individual or 
computer).

Similarly, the term "linked" refers to how information is being 
stored in connection with a cookie. All data in a cookie or linked to 
a particular user must be disclosed in the cookie's policy. Using the 
terminology above, if the data collector collects "identifiable" 
information about the user it is generally "linked" data.

Identity Through Information Type

The Working Group felt that different user agent implementations 
could be created to focus on different concerns around data type. 
Therefore, the working group enabled the creation of a robust data 
schema including broad categories of information that may be 
considered sensitive by certain user groups.  The Working Group hopes 
that a diverse set of user agents will be created to allow users the 
ability to make identity decisions based on specific collections and 
types of collects if they desire to do so.  For example, a user agent 
could allow users to opt to be prompted when medical or financial 
identifier is being collected, independent of how that information is 
being used.

(1)   More information on the debate and the definitions can be found 
in Lorrie Faith Cranor's book Web Privacy with P3P, O'Reilly, 2002.



-- 
------------------------------------
Ari Schwartz
Associate Director
Center for Democracy and Technology
1634 I Street NW, Suite 1100
Washington, DC 20006
202 637 9800
fax 202 637 0968
ari@cdt.org
http://www.cdt.org
------------------------------------
Received on Friday, 20 June 2003 12:16:33 UTC