difference between nonident and NON-IDENTIFABLE from Lorrie Cranor on 2002-05-08 (www-p3p-policy@w3.org from May 2002)

From: Lorrie Cranor <lorrie@research.att.com>
Date: Wed, 8 May 2002 09:26:12 -0400
To: <www-p3p-policy@w3.org>
Message-ID: <004a01c1f693$ec8d1960$9816cf87@barbaloot>
Someone asked me:

"Is there any meaningful difference between the <nonident/> ACCESS element
and the <NON-IDENTIFIABLE/> element? In other words, if you have <nonident/>
in the ACCESS element, must you also have <NON-IDENTIFIABLE/> in each of
your statements? The only reason that it's not quite clear to me is that the
definition of <NON-IDENTIFIABLE/> has a couple of paragraphs describing the
strict criteria that must be met, but <nonident/> has but a sentence."

Since this question may be of general interest, I am posting my response:

The <nonident/> (web site does not collect identified data) ACCESS
element and <NON-IDENTIFIABLE/> (web site does not collect identifiable
data) element are very different beasts. <nonident/> is for use by
sites who don't provide access because they don't have any data that
they can use to readily identify an individual. Such sites may or may
not qualify to also use <NON-IDENTIFIABLE/>, which requires that
neither the site or a third party be able to identify the individual. A
site with no forms, but logs IP address can probably use <nonident/>
but would not be able to use <NON-IDENTIFIABLE/> without
cleansing its logs. Also, when NON-IDENTIFIABLE is use there
must be an explanation in the human-readable policy.

Here are some excerpts from my forthcoming book on P3P
that discuss this further:


Identified Data

In privacy regulations, guidelines and papers about privacy, a variety of
terms are used to describe information that identifies an individual to
varying degrees. The terms personal information, personally identifiable
information (PII), and customer identifiable information are used frequently
in the US with slightly varying definitions. The term customer proprietary
network information (CPNI) is formally defined in US telecommunications
regulations. The terms identified, identifying, and identifiable are often
used as well. The various definitions of these terms differ in whether they
include otherwise non-identifiable data if it is stored in certain ways,
used in certain ways, or combined with certain other data.

The P3P specification generally uses the term "identified data" to refer to
data that a data collector can use to reasonably identify an individual.
Thus, this definition applies to information such as a full name that can
identify an individual on its own, as well as to data that identifies an
individual when used in combination. However, the coverage of this term is
limited to combinations of data performed by a single data collector using a
reasonable amount of effort. Thus, if a data collector has data that could
be used to identify someone if it were combined with other data obtainable
only from another source, the data will generally not be considered
identified (if the other source is a public directory; however, the data
probably would be considered identified). Of course, this definition does
leave some gray areas which are open to interpretation.

Note that the NON-IDENTIFIABLE element (discussed later in this chapter)
uses the term "identifiable" with a very broad definition that includes any
data that could be used to identify an individual.



Non-identifiable

The NON-IDENTIFIABLE element is an optional sub-element of a STATEMENT that
may be included when there is no identifiable data collected, or when any
data collected has been anonymized. When using this element, be very careful
that your web site meets the non-identifiable criteria "there is no
reasonable way for the entity or a third party to attach the collected data
to the identity of a natural person" and make sure you include a
human-readable explanation of how this is achieved at the discuri. Unless
your site has no form submissions and keeps no web logs, or takes pains to
make sure the web logs are sanitized so that they do not contain IP
addresses or other potentially identifying information, you probably don't
qualify to use this element. If you anonymize data, you must make sure to
remove the original data from all your logs, backup tapes, etc. And any
technique you use to anonymize data must be non-reversible-for example,
removing the last seven bits of an IP address and replacing them with zeros.
A one-way cryptographic hash functions would not be considered
non-reversible if the set of possible data values is small enough that all
possible hashed values can be generated and compared with the value that
someone is attempting to reverse.
Here is an example statement that contains a NON-IDENTIFIABLE element:

<STATEMENT>
  <NON-IDENTIFIABLE/>
  <PURPOSE><admin/></PURPOSE>
  <RECIPIENT><ours/></RECIPIENT>
  <RETENTION><stated-purpose/></RETENTION>
  <DATA-GROUP>
   <DATA ref="#dynamic.clickstream.uri"/>
   <DATA ref="#dynamic.clickstream.timestamp"/>
  </DATA-GROUP>
</STATEMENT>

This statement indicates that the requested URL and the time of the request
are the only collected data. For this to be accurate, the web site would
need to make sure that other information, such as the IP address associated
with each request, is not recorded in log files. The site must also include
an explanation in its human-readable privacy policy of how it ensures that
the data it collects is not identifiable.
Received on Wednesday, 8 May 2002 09:33:27 UTC