- From: Ari Schwartz <ari@cdt.org>
- Date: Wed, 6 Aug 2003 12:22:35 -0400
- To: <public-p3p-spec@w3.org>
Updated Draft based on comments from the WG: ---- Identity Definitions in the P3P Specification In privacy regulations, guidelines and papers about privacy a variety of terms are used to describe data that identifies an individual to varying degrees. The European Union Directive defines "an identifiable person" as "one who can be identified, directly or indirectly, in particular by reference to an identification number or to one or more factors specific to his physical, physiological, mental, economic, cultural or social identity." The Directive also states that in determining whether a person is identifiable "account should be taken of all the means likely reasonably to be used either by the controller or by any other person to identify the said person; whereas the principles of protection shall not apply to data rendered anonymous in such a way that the data subject is no longer identifiable." In other policy documents terms such as "personally identifiable information (PII)" are often not defined or the cause for heated debate. In different documents, "identity" can be tied to: 1) how the information can be or is being used, 2) how the information is stored, or 3) the type of information. The P3P Specification Working Group tried to capture all three of these ideas so that different implementers and users can make decisions based on the importance they place on these various definitions of identity. (1) Identity Through Usage ("identified" data) The most common term in the specification is "identified data" and focuses on how the information can be or is being used. "Identified data" is information that reasonably can be used by the data collector to identify an individual. Admittedly, this is a somewhat subjective standard. For example, a data collector storing Internet Protocol (IP) addresses (which can be created dynamically or could be static and therefore tied to a particular computer used by a single individual) should consider the IP address "identified data" only when an attempt is made to tie the exact addresses to past records or work with others to identify the specific individual or computer over a long period of time. In the more common case, where data collectors use IP addressing information in the aggregate or make no attempt to tie the IP address to a specified individual or computer over a long period of time, IP addresses are not considered identified even though it is possible for someone (eg, law enforcement agents with proper subpoena powers) to identify the individual based on the stored data. In the P3P Specification, the term "identifiable" is used in a similar way as it as used in the EU Directive. Thus, in the P3P context, any data that can be used reasonably by a data controller or any other person to identify an individual is considered to be identifiable data. The P3P specification uses the term "identified" to describe a subset of this data that can be reasonably used by a data collector *without assistance from other parties* to identify an individual. Identity Through Storage ("non-identifiable" and "linked" data) The working group also felt that data collectors should be able acknowledge when they make specific attempts to anonymize what would otherwise be identifiable in its storage. The term "non-identifiable" data refers to how the information is stored. For example, a data collector collecting and storing IP addresses but not using them should NOT call this data "non-identifiable" even in the common case where they have no plans to identify an actual individual or computer. However, if a Web site collects IP addresses, but actively deletes all but the last four digits of this information in order to determine short term use, but insure that a particular individual or computer cannot be consistently identified, then the data collector can and should call this information "non-identifiable." Also, non-identifiable can be used in cases where no information is being collected at all. Since most Web servers are designed to keep Web logs for maintenance, this would most likely mean that the data collector has taken specific efforts to ensure the anonymity of users. Under the above definitions, a lot of information could be "identifiable" (not specifically made anonymous), but not "identified" (reasonably able to be tied to an individual or computer). Similarly, the term "linked" refers to how information is being stored in connection with a cookie. All data in a cookie or linked to a particular user must be disclosed in the cookie's policy. Using the terminology above, if the data collector collects "identifiable" information about the user it is generally "linked" data. For example, if the data collector stores a login name in a file associated with a persistent cookie and the login name is linked to personal data, the cookie is clearly "linked." In less clear cut example, if the data collector ties the cookie to a specific order id in a flat file and that order id is tied to personal information in a related file, the cookie would be linked to all of the relational data unless specific precautions have been taken to ensure that a data operator with access to the relational data cannot access the flat cookie data and vice versa. Identity Through Information Type The Working Group decided against an identified or identifiable label for particular types of data. However, user agent implementers have the option of assigning these or other labels themselves and building user interfaces that allow users to make decisions about web sites on the basis of how they collect and use certain types of data. The Working Group felt that different user agent implementations could be created to focus on different concerns around data type. Therefore, the working group enabled the creation of a robust data schema including broad categories of information that may be considered sensitive by certain user groups. The Working Group hopes that a diverse set of user agents will be created to allow users the ability to make identity decisions based on specific collections and types of collects if they desire to do so. For example, a user agent could allow users to opt to be prompted when medical or financial identifier is being collected, independent of how that information is being used. (1) More information on the debate and the definitions can be found in Lorrie Faith Cranor's book Web Privacy with P3P, O'Reilly, 2002. -- ------------------------------------ Ari Schwartz Associate Director Center for Democracy and Technology 1634 I Street NW, Suite 1100 Washington, DC 20006 202 637 9800 fax 202 637 0968 ari@cdt.org http://www.cdt.org ------------------------------------
Received on Wednesday, 6 August 2003 12:21:56 UTC