- From: Ari Schwartz <ari@cdt.org>
- Date: Fri, 20 Jun 2003 12:09:27 -0400
- To: public-p3p-spec@w3.org
Here is my draft text for addressing the confusion around identity terms in the spec. >http://www.w3.org/Bugs/Public/show_bug.cgi?id=167 Identity Definitions in the P3P Specification In privacy regulations, guidelines and papers about privacy a variety of terms are used to describe data that identifies an individual to varying degrees. Some common terms such as "personally identifiable information (PII)" are often not defined or the cause for heated debate. In different documents, "identity" can be tied to: 1) how the information can be or is being used, 2) how the information is stored, or 3) the type of information. The P3P Specification Working Group tried to capture all three of these ideas so that different implementers and users can make decisions based on the importance they place on these various definitions of identity. (1) Identity Through Usage ("identified" data) The most common term in the specification is "identified data" and focuses on how the information can be or is being used. "Identified data" is information that reasonably can be used by the data collector to identify an individual. Admittedly, this is a somewhat subjective standard. For example, a data collector storing Internet Protocol (IP) addresses (which can be created dynamically or could be static and therefore tied to a particular computer used by a single individual) should consider the IP address "identified data" only when an attempt is made to tie the exact addresses to past records or work with others to identify the specific individual or computer over a long period of time. In the more common case, where data collectors use IP addressing information in the aggregate or make no attempt to tie the IP address to a specified individual or computer over a long period of time, IP addresses are not considered identified even though it is possible for someone (eg, law enforcement agents with proper subpoena powers) to identify the individual based on the stored data. Identity Through Storage ("non-identifiable" and "linked" data) The working group also felt that data collectors should be able acknowledge when they make specific attempts to anonymize what would otherwise be identifiable in its storage. The term "non-identifiable" data refers to how the information is stored. For example, a data collector collecting and storing IP addresses but not using them should NOT call this data "non-identifiable" even in the common case where they have no plans to identify an actual individual or computer. However, if a Web site collects IP addresses, but actively deletes all but the last four digits of this information in order to determine short term use, but insure that a particular individual or computer cannot be consistently identified, then the data collector can and should call this information "non-identifiable." Also, non-identifiable can be used in cases where no information is being collected at all. Since most Web servers are designed to keep Web logs for maintenance, this would most likely mean that the data collector has taken specific efforts to ensure the anonymity of users. Under the above definitions, a lot of information could be "identifiable" (not specifically made anonymous), but not "identified" (reasonably able to be tied to an individual or computer). Similarly, the term "linked" refers to how information is being stored in connection with a cookie. All data in a cookie or linked to a particular user must be disclosed in the cookie's policy. Using the terminology above, if the data collector collects "identifiable" information about the user it is generally "linked" data. Identity Through Information Type The Working Group felt that different user agent implementations could be created to focus on different concerns around data type. Therefore, the working group enabled the creation of a robust data schema including broad categories of information that may be considered sensitive by certain user groups. The Working Group hopes that a diverse set of user agents will be created to allow users the ability to make identity decisions based on specific collections and types of collects if they desire to do so. For example, a user agent could allow users to opt to be prompted when medical or financial identifier is being collected, independent of how that information is being used. (1) More information on the debate and the definitions can be found in Lorrie Faith Cranor's book Web Privacy with P3P, O'Reilly, 2002. -- ------------------------------------ Ari Schwartz Associate Director Center for Democracy and Technology 1634 I Street NW, Suite 1100 Washington, DC 20006 202 637 9800 fax 202 637 0968 ari@cdt.org http://www.cdt.org ------------------------------------
Received on Friday, 20 June 2003 12:16:33 UTC