privacy document from Robert Thibadeau on 2000-08-23 (www-p3p-public-comments@w3.org from August 2000)

From: Robert Thibadeau <rht@cs.cmu.edu>
Date: Tue, 22 Aug 2000 21:06:39 -0400
To: rr@cs.cmu.edu, shamos@cs.cmu.edu, jle@cs.cmu.edu, Chip Gierhart <Chip_Gierhart@Phoenix.com>
CC: john_bourgein@Phoenix.com
Message-ID: <39A3239F.84A08314@cs.cmu.edu>
Here is my brief critique on Privacy and www.w3.org/p3p.


A Serious Critique of P3P : Privacy on the Web

Robert Thibadeau, Ph.D.
School of Computer Science
Carnegie Mellon University
Pittsburgh PA 

The World Wide Web (W3) consortium that brought the Hypertext Transfer
Protocol, HTTP, that allows Browsers to talk to Web Servers, and the
Hypertext Markup Language, HTML, that lets Browsers show what they hear
from the Web Servers, has recently brought a lot more.   The eXtensible
Markup Language, XML, provided a framework for content communication
between Browsers and Web Servers.  It is natural that Privacy is
something that the Browsers and Web Servers need to talk about to each
other.

It's called the Platform for Privacy Preferences or P3P.  It is defined
authoritatively by the W3 at http://www.w3.org/p3p.   This is a serious
and excellent effort by the W3 to provide the mechanisms for insuring
privacy on the Internet.  Huge amounts of information are being
collected by many thousands of web sites today.  While an effective
technology, called SSL (Secure Sockets Layer) exists for protecting the
privacy of the monetary transaction between a Browser and a Web Server,
there is no protection once the information is on the Server and in the
hands of the company or organization that lured you to it.

Because P3P is such an outstanding work, it deserves the most serious of
critiques.  It is essential to know what it does, and what it does not
do.  For a period of time, P3P will be a work in progress.  There is
opportunity to hone the edge on this knife so beautifully made.  

First, an in-depth overview of P3P is in order.  Then, my critique will
cover all facets of the platform, from assumptions to implementation. 
The argument is that P3P is dangerously myopic and needs to be enhanced. 

Four areas of critical need are (a) an IETF (Internet Engineering Task
Force) definition that doesn't require the web (specifically, the HTTP
protocol), (b) a means for transitivity and universality of the
protection on information, (c) a means to establish a negotiated
contract that goes beyond W3's APPEL (A P3P Preference Exchange
Language), and (d) a means in the law for policing the contracts
obtained.  It is irrelevant, in this paper, whether the W3 technical
committee or the U.S. Congress should address problems with P3P.  As
many people as possible should deeply understand the Internet privacy
debate.  This is every individual's and every organization's privacy
that is being negotiated through this debate.

P3P works as a series of HTTP communications initiated by a Browser
request to a Web Server.  In this communication, the Browser says
nothing about privacy.  Then the Web Server responds to the Browser with
whatever the Browser asked for, plus a reference to a privacy policy
reference page.   

The Browser or person operating it, can now determine what do with the
Web Server's response based on another HTTP request, now for the privacy
policy reference page.  The Browser reads the PolicyRef page and decides
what to do.  The PolicyRef page is in the language of XML and has many
very definite things it can say.  Privacy policy reference pages are
very special and can be used to determine whether the Browser should
ever come back to that Web Server again and whether information from a
Form should be sent to that Web Server.

In P3P, the Browser, at the very beginning, exposes itself to a minimum
of two invasions of privacy.  The first is the first request to a web
server page.  The second is to the PolicyRef page.   In theory the
second such request is supposed to provide a "safe zone".  A safe zone
is simply a voluntary agreement by the web server not to record anything
from the browser that is making the request. Furthermore, if a browser
wants to be safe about the first request, it can issue a "HEAD" request
that simply returns the Message Header from the site that contains the
policy reference.  This HEAD request-response is supposed to be made
safe. 

HTTP defines a communication from a Browser to a Server and from a
Server to a Browser.  These communication both have three parts.  For
the Browser-to-Server, in essence asking the Web Server to do something,
explaining how it wants it done, and then data to do it with.   The
second part is called the HTTP Message Header information.  Similarly,
when a Web Server talks back to the Browser, it has three parts, in
essence telling the Browser if it did what was asked, explaining how it
is doing it, and the data that does it.  Again, the second part is the
HTTP Message Header information. This is where the first P3P
information, the PolicyRef page location, is given.  Notice there is no
P3P information in the communication from the Browser to the Server. 
P3P is a completely one-sided service.  The Server tells the Browser
it's Privacy Policies and the browser is now on its own.   It tells the
Browser it's privacy policies by giving the browser the PolicyRef page
to go and fetch.  The Browser can choose to do this, but it remains on
it's own.

The browser fetches the PolicyRef page to decide what to do.  Here the
P3P information is in the content of the page itself, and it is encoded
in an elaborate XML language as well as, possibly, an HTML presentation
for the benefit of a human being who wants to read the privacy policy.  
The browser (or a program in the browser called the "user agent")
decides unilaterally whether to accept the privacy policy presented to
it.  

This policy can say many things.  It can isolate things like name and
address and stipulate that they will be used one way, perhaps solely to
authorize payment, while other things like email address might be used
for marketing follow up.   The categories of information that the web
site may deal with are specified in the following list:

1.	<physical/>  Physical Contact Information
2.	<online/> Online Contact Information
3.	<uniqueid/> Unique Identifiers
4.	<purchase/> Purchase Information
5.	<financial/> Financial Information
6.	<computer/> Computer Information
7.	<navigation/> Navigation and Click-stream Data
8.	<interactive/> Interactive Data
9.	<demographic/> Demographic and Socioeconomic Data
10.	<content/> Content
11.	<state/> State Management Mechanisms
12.	<political/> Political Information
13.	<health/>     |Health Information
14.	<preference/> Preference Data
15.	<other/> Other

The policy will state the purpose of the information that is being
obtained.  Here is the list of purposes (with exact quotes from the
specification):

1.	<current/>   Completion and Support of Current Activity: Information
may be used by the service provider to complete the activity for which
it was provided, such as the provision of information, communications,
or interactive services -- for example to return the results from a Web
search, to forward email, or place an order. 
2.	<admin/>  Web Site and System Administration: Information may be used
for the technical support of the Web site and its computer system. This
would include processing computer account information, and information
used in the course of securing and maintaining the site. 
3.	<develop/> Research and Development: Information may be used to
enhance, evaluate, or otherwise review the site, service, product, or
market. This does not include personal information used to tailor or
modify the content to the specific individual nor information used to
evaluate, target, profile or contact the individual. 
4.	<customization/> Affirmative Customization: Information may be used
to tailor or modify the content or design of the site only to
specifications affirmatively selected by the particular individual
during a single visit or multiple visits to the site. For example, a
financial site that lets users select several stocks whose current
prices are displayed whenever the user visits. 
5.	<tailoring/> One-time Tailoring: Information may be used to tailor or
modify content or design of the site not affirmatively selected by the
particular individual where the information is used only for a single
visit to the site and not used for any kind of future customization. For
example, an online store that suggests other items a visitor may wish to
purchase based on the items he has already placed in his shopping
basket. 
6.	<pseudonym/>  Pseudononymous Profiling: Information may be used to
create or build a record of a particular individual or computer that is
tied to a pseudononymous identifier, without tying
personally-identifiable information (such as name, address, phone
number, email address, or IP address) to the record. This profile will
be used to determine the habits, interests, or other characteristics of
individuals, but it will not be used to attempt to identify specific
individuals. 
7.	<profiling/>  Individual Profiling: Information may be used to create
or build a record on the particular individual or computer for the
purpose of compiling habits or personally identifiable information of
that individual or computer. For example, an online store that suggests
items a visitor may wish to purchase based on items he has purchased
during previous visits to the web site. 
10.	<contact/> Contacting Visitors for Marketing of Services or
Products: Information may be used to contact the individual for the
promotion of a product or service. This includes notifying visitors
about updates to the Web site. 
11.	<other-purpose> string </other-purpose> Other Uses: Information may
be used in other ways not captured by the above definitions. (A human
readable explanation should be provided in these instances).

P3P clearly provide a way to stipulate the purpose to which the user's
information disclosure is put.  This is highly commendable.  Perhaps the
choice of particular purposes is not so good.

As one example of the potential problem in action, let us take the case
of giving your name, credit card, and address information for an order. 
Basically the site that wants you to feel safe can say that this
information will be used for it's current purpose as explained on the
page you saw.  Yes.  So, for example, if I print in very fine print at
the bottom of the page that my current purpose is to give your credit
card number to the first thief I can find, I have fulfilled my
obligation.  If you happen to read the fine print, you know what is
going to happen to your credit card.  A lawyer might argue otherwise,
but the fact is that the only thing in writing from the web server is
that the purpose is stipulated to be written on the page and the page
says that the purpose of taking the credit card information is to hand
it to a thief (as well as, probably, to make a payment).  The "TrustMe"
symbol is displayed prominently as make a correct use of P3P.

It might be better to have very concrete, in addition to very abstract,
purposes, and let people know these concrete ones are possible.  So, for
example, in addition to the <current/> tag that just caused some heart
burn, we might have:
1.	<payment/> The purpose is to obtain payment for the order.
2.	<delivery/> The purpose is to deliver the order to the address.
3.	<web_search/> The purpose is to perform the current web search.
And so forth. We have hundreds of "HTTP types" (the typing of data legal
in HTTP data messages), it would seem we could have hundreds of very
specific purposes for information.

If you don't think companies will try to use ploys to get you to trust
them, read the IBM privacy policy on the IBM P3P Editor site:

"This Overall Privacy Statement verifies that IBM is a member of the
TRUSTe  program and is in compliance with TRUSTe privacy principles.
This statement discloses the privacy practices for the IBM Web
(ibm.com).  TRUSTe is an  independent, non-profit initiative whose
mission is to build users' trust and confidence in the Internet by
promoting the principles of disclosure and informed consent. Because
this site wants to demonstrate its commitment to your privacy,  it has
agreed to disclose its information practices and have its privacy
practices reviewed and audited for compliance by TRUSTe. When you visit
a Web site displaying the TRUSTe mark, you can expect to be notified of: 
        What information is gathered/tracked 
        How the information is used 
        Who information is shared with 

Questions regarding this statement should be directed to
askibm@vnet.ibm.com or TRUSTe for clarification. 

We know that you are concerned about your privacy; so is IBM. If you
provide IBM with information about yourself, such as name, postal
address, e-mail address, or other personal data, we may add it to our
records. From time to time you may receive information about our
products, services, activities, or contacts for other business purposes,
unless you request otherwise by selecting the appropriate button on the
data collection page.

IBM is a global organization with legal entities operating components of
our Web site worldwide. Because of the global scope of our Web, we may
transfer your  personal information to countries of the world which
provide various levels of legal protection.  Please realize that when
you give us personal information, IBM will handle it in the manner we
describe here. To learn more, you can read about IBM's general Internet
privacy practices. Our privacy practices are designed to provide a high
level of protection for your personal data, all over the world. 

This Web site is maintained by the International Business Machines
Corporation.

You can reach us by telephone by calling +1-416-383-9224; within North
America you can reach us at 1 800-426-7777. You can also send us a
message at askibm@vnet.ibm.com.

Please use the Back button on your browser to return to the page where
you
were. "

Yes, they said they were going to disclose your personal information to
countries of the world that provide "various levels of legal
protection." (Did you get that far?) But don't worry, "when you give us
personal information, IBM will handle it in the manner we describe
here."  Note, there is nothing in P3P that provides an automatically
confirmation that your personal information will escape the laws of the
United States (and how about Europe?)

Before thinking that P3P is just not worth anything, it needs to be
recognized that the writers of the 1.0 working draft specification are
openly soliciting comments, have disclosed this specification, and have
created a specification that covers all the bases that need to be
covered in a basic privacy specification.  Not only do they allow the
use of different kinds of information to be different, they understand
that a purpose or intent is actually a simple thing to state and
evaluate.  They also provide explicit tags for many other contingencies
such as tags that tell who the ultimate recipients of the data will be,
tags that even tell the user what penalty the web site is willing to pay
for misusing the data!  These are all very good things and they are laid
out in a fashion that makes machine interpretation possible, and, in
fact, reasonable.

The writers also explicitly say that P3P 1.0 lacks the following
desirable characteristics:
	"a mechanism to allow sites to offer a choice of P3P policies to
visitors 
	a mechanism to allow visitors (through their user agents) to
explicitly agree to a P3P policy 
	mechanisms to allow for non-repudiation of agreements between visitors
and web sites 
	a mechanism to allow user agents to transfer user data to services"

In effect, P3P 1.0 lacks the ability to negotiate with the web server on
a contract and to make a contract with the web server that could be
legally binding.  All of this is fundamentally because in 1.0 the web
server simply provides an ultimatum to the browser.  The browser can
leave the country if he doesn't want to live there, but it can't talk
back.

P3P 1.0 is likely therefore to create some unpleasant behavior for
users.  The user is simply warned that this web site is going to use his
information for marketing purposes and will report the data to a third
party.  But, let's say this is his stockbroker.  What does he do then? 
Call the Webmaster?  (Get serious.)  A better mechanism would be for his
Browser to start negotiating with the web server to tell it what he is
willing to do.  The server can then decide whether it wants his
business.  

The P3P group clearly understands that such negotiation is going to be
important in future versions of the specification.  In fact, there is an
affiliated group called APPEL (A P3P Preference Exchange Language, see
the P3P site) that has proposed a rule-based reasoning system for
privacy meant to go hand-in-hand with P3P.  However, there is still not
a mechanism in P3P or APPEL for the Browser to talk back to the Server
about privacy, so this rule system has only limited utility at present.

It seems pretty obvious that P3P needs a means to establish a negotiated
contract that goes beyond W3's APPEL,  but it also needs a means in the
law for policing the contracts obtained.  Chances are that the P3P group
is pretty skittish about suggesting that the law get involved in any of
this stuff.  However, as an old mentor of mine once said, "The trick is
to understand where the technology ends and the law begins."  The
mechanism of non-repudiation provides a "signed" contract between the
user and the server.   The agreement cannot be repudiated as not having
happened.  However, there is almost too much information in this
contract.  Ideally all you want to know is that the Web Server has used
the information in a way it promised not to.  You should not have to
disclose all the users that visited the Sex site and gave up their
information.  It would seem to me that along with non-repudiation you
want to have anonymous users.  This cannot happen in a non-mediated
transaction as has been proposed for the direct Browser to Server P3P
interactions.  Exactly how to create this scenario around data that
contains names and addresses is going to be a technically interesting
challenge.

Another criticism of P3P is that it is a web-only solution to privacy
when we know that the Internet involves much more than simply the web. 
For example, email moves by other protocols (mainly SMTP), not HTTP. 
There is no way in P3P to say that the mail you send to a company
contains information that should only be used in ways that you
restrict.  Most lawyers would not want to tell you about how little
weight their messages to treat an email as confidential actually
carries.  They have to show that you agreed to treat the email as
confidential.  It is easy to argue that you "accidently read" the mail
before seeing the "confidentiality statement."  Without negotiated
agreement, P3P is completely ill-suited to mail routing.  However, a
strong case can be made, I think, that P3P describes precise and
essential building blocks for a solution to privacy in mail and many
other Internet protocols.   P3P should not be thrown away, it should be
built upon.

The last of the serious criticisms is that P3P fails to provide a means
for transitivity and universality of the protection of information. 
This is actually several things. 

The transitivity problem is how to protect your privacy after the
information is handed to somebody else.  If a violation of privacy is
generally a misuse of information about you or information that you
provide (e.g., a trade secret), then there must be a way in the privacy
protocol to indicate that a privacy directive is essentially
non-negotiable, or negotiable only back to the original owner, and this
needs to be passed on to the next possessor of the information. 
Accomplishing this would be fairly simple unless the information changes
and becomes derivative information.  If something is learned and a
conclusion is drawn, is the information that caused the learning binding
its privacy directives on the conclusion.  This is a big problem, not a
small one.  The solution is to create directives on derivative
information.  Essentially the directive says that the purpose of the Web
Server is to record information so that conclusions or derivative
observations can be obtained and that this information now becomes the
property of the owner of the Web Server.  Conversely, a User may say to
the Web server that it can use the information to clear a credit card or
to give the user a registration account, but that any derivative
information must be restricted to just this.  A Web Server taking this
information could not pass it on or provide it as the basis for
developing new knowledge.  These two cases, actually, are handled in
P3P.  What is not is the case where the requirement is that the
information can be passed along but that the privacy conditions must be
preserved by the new owner.  There is no mechanism in P3P for preserving
the integrity of the use to which the information can be put.  A
particularly useful case of this might be the case where your personal
information can be passed along but only non-identifiable summaries used
for marketing purposes.

The universality of the protection of information is yet another problem
with the current specification.  The way P3P is set up, the user has to
set up each Browser or "User Agent" that he uses.  If he works on ten
machines at the office and at home he has to take care to make sure all
the machines utilize the same privacy policies (or he might as well have
no privacy policies at all!)  It should be possible to have a location
on the Internet where you have your name-and-address information.  This
bundle of information can be invoked exclusively by you, and it has a
shell of a privacy policy surrounding it.  Thus, you can insert your
name-and-address in any web browser or in any email message and the
privacy policy is negotiated with the recipient.  Accomplishing a system
such as this does not strike me as much more complex than the existing
Domain Name Service that works well on the Internet.  P3P, in this view,
makes the conceptual error of thinking that privacy is intransitive and
that it is not necessary to describe information being sent by the user
with a unique ID (like the URI for the PolicyRef statement on the
server) for its privacy policy.   

I have no doubt that many members of the W3 P3P working group have
thought through many if not most of the concerns expressed in this
article.  However, these are not the people who are likely to talk about
such concerns since their main interest is in getting P3P accepted.  
Pointing out flaws like the ones pointed out above don't, on the
surface, look like help in getting P3P accepted.  But my argument is the
opposite.   It is probably better for a third party to speak out on
these and to invite more vigorous public discussion.  This is precisely
because P3P takes us in the right direction.  It deserves to be
supported and added to since it so clearly represents a good start.  
People in all aspects of the Internet socio-economic-political system
need to sit up and think this through for themselves.  Information is
power, and privacy management is the control of that power.
Received on Tuesday, 22 August 2000 21:07:31 UTC