- From: Lorrie Cranor <lorrie@research.att.com>
- Date: Wed, 8 May 2002 09:26:12 -0400
- To: <www-p3p-policy@w3.org>
Someone asked me: "Is there any meaningful difference between the <nonident/> ACCESS element and the <NON-IDENTIFIABLE/> element? In other words, if you have <nonident/> in the ACCESS element, must you also have <NON-IDENTIFIABLE/> in each of your statements? The only reason that it's not quite clear to me is that the definition of <NON-IDENTIFIABLE/> has a couple of paragraphs describing the strict criteria that must be met, but <nonident/> has but a sentence." Since this question may be of general interest, I am posting my response: The <nonident/> (web site does not collect identified data) ACCESS element and <NON-IDENTIFIABLE/> (web site does not collect identifiable data) element are very different beasts. <nonident/> is for use by sites who don't provide access because they don't have any data that they can use to readily identify an individual. Such sites may or may not qualify to also use <NON-IDENTIFIABLE/>, which requires that neither the site or a third party be able to identify the individual. A site with no forms, but logs IP address can probably use <nonident/> but would not be able to use <NON-IDENTIFIABLE/> without cleansing its logs. Also, when NON-IDENTIFIABLE is use there must be an explanation in the human-readable policy. Here are some excerpts from my forthcoming book on P3P that discuss this further: Identified Data In privacy regulations, guidelines and papers about privacy, a variety of terms are used to describe information that identifies an individual to varying degrees. The terms personal information, personally identifiable information (PII), and customer identifiable information are used frequently in the US with slightly varying definitions. The term customer proprietary network information (CPNI) is formally defined in US telecommunications regulations. The terms identified, identifying, and identifiable are often used as well. The various definitions of these terms differ in whether they include otherwise non-identifiable data if it is stored in certain ways, used in certain ways, or combined with certain other data. The P3P specification generally uses the term "identified data" to refer to data that a data collector can use to reasonably identify an individual. Thus, this definition applies to information such as a full name that can identify an individual on its own, as well as to data that identifies an individual when used in combination. However, the coverage of this term is limited to combinations of data performed by a single data collector using a reasonable amount of effort. Thus, if a data collector has data that could be used to identify someone if it were combined with other data obtainable only from another source, the data will generally not be considered identified (if the other source is a public directory; however, the data probably would be considered identified). Of course, this definition does leave some gray areas which are open to interpretation. Note that the NON-IDENTIFIABLE element (discussed later in this chapter) uses the term "identifiable" with a very broad definition that includes any data that could be used to identify an individual. Non-identifiable The NON-IDENTIFIABLE element is an optional sub-element of a STATEMENT that may be included when there is no identifiable data collected, or when any data collected has been anonymized. When using this element, be very careful that your web site meets the non-identifiable criteria "there is no reasonable way for the entity or a third party to attach the collected data to the identity of a natural person" and make sure you include a human-readable explanation of how this is achieved at the discuri. Unless your site has no form submissions and keeps no web logs, or takes pains to make sure the web logs are sanitized so that they do not contain IP addresses or other potentially identifying information, you probably don't qualify to use this element. If you anonymize data, you must make sure to remove the original data from all your logs, backup tapes, etc. And any technique you use to anonymize data must be non-reversible-for example, removing the last seven bits of an IP address and replacing them with zeros. A one-way cryptographic hash functions would not be considered non-reversible if the set of possible data values is small enough that all possible hashed values can be generated and compared with the value that someone is attempting to reverse. Here is an example statement that contains a NON-IDENTIFIABLE element: <STATEMENT> <NON-IDENTIFIABLE/> <PURPOSE><admin/></PURPOSE> <RECIPIENT><ours/></RECIPIENT> <RETENTION><stated-purpose/></RETENTION> <DATA-GROUP> <DATA ref="#dynamic.clickstream.uri"/> <DATA ref="#dynamic.clickstream.timestamp"/> </DATA-GROUP> </STATEMENT> This statement indicates that the requested URL and the time of the request are the only collected data. For this to be accurate, the web site would need to make sure that other information, such as the IP address associated with each request, is not recorded in log files. The site must also include an explanation in its human-readable privacy policy of how it ensures that the data it collects is not identifiable.
Received on Wednesday, 8 May 2002 09:33:27 UTC