privacy doc fix up from Robert Thibadeau on 2000-08-23 (www-p3p-public-comments@w3.org from August 2000)

From: Robert Thibadeau <rht@cs.cmu.edu>
Date: Wed, 23 Aug 2000 08:58:04 -0400
To: www-p3p-public-comments@w3.org
Message-ID: <39A3CA5C.763A97C0@cs.cmu.edu>
Stylistic fixup on the previous document:

A Critique of P3P : Privacy on the Web

Robert Thibadeau, Ph.D.
School of Computer Science
Carnegie Mellon University
Pittsburgh PA 

The World Wide Web (W3) consortium brought the Hypertext Transfer
Protocol, HTTP, that allows Browsers to talk to Web Servers.  It brought
the Hypertext Markup Language, HTML, that lets Browsers show what they
hear from the Web Servers.  It has recently brought a lot more.  The
eXtensible Markup Language, XML, provides a framework for automated
content communication between Browsers and Web Servers.  XML is widely
used in merging data flow through Web Servers, and, anybody who has
encountered Windows 2000 has seen XML content in lots of files that need
automated content processing.  XML has now naturally set the stage for
automated privacy protection.

The privacy assurance proposal is called the Platform for Privacy
Preferences or P3P. This serious and excellent effort by the W3 is
defined authoritatively at http://www.w3.org/p3p.   Today, enormous
amounts of information are being collected by many thousands of web
sites.  While an effective technology, called SSL (Secure Sockets
Layer), exists for protecting the privacy of the transaction between a
Browser and a Web Server, there is no protection once the information is
on the Server and in the hands of the company or organization that
'lured' you to them.

Because P3P is an outstanding work, it deserves serious critique.  It is
essential to know what it does, and what it does not do.  For a period
of time, P3P will be a work in progress.  There is opportunity to hone
the edge on this knife so beautifully made.  

The present critique will cover most of the facets of the platform,
examining both the assumptions and implementation.  It will be seen that
P3P is dangerously myopic, and it needs substantial enhancement.  The
five areas of critical need are 

(a)	more specificity in declaring the purpose behind taking information, 
(b)	a means to establish a negotiated contract that goes beyond W3's
APPEL (A P3P Preference Exchange Language), 
(c)	a means in the law for policing the contracts obtained 
(d)	a means for transitivity and universality of the protection on
information, and 
(e)	an IETF (Internet Engineering Task Force) definition that doesn't
require the web (specifically, the HTTP protocol).  It is irrelevant, in
this paper, whether the W3 technical committee or the U.S. Congress
should address problems with P3P.  As many people as possible should
deeply understand the Internet privacy debate.  This is every
individual's and every organization's privacy that is being negotiated
through this debate.

P3P works as a series of HTTP communications.   The first is a Browser
request to a Web Server for a file or an action.  In this communication,
the Browser says nothing about privacy to the Web Server.  However, the
Web Server responds to the Browser with whatever the Browser asked for,
plus a special reference to a Privacy Policy Reference page.   The
Browser or person operating it, can now determine what do with the Web
Server's response based on the Privacy Policy Reference page provided by
a second HTTP request.  The Browser reads the Policy-Ref page and
decides what to do.  This PolicyRef page is in the language of XML. It
has many very definite things it can say.  A Privacy policy reference
page is very special and can be used to determine whether the Browser
should ever come back to that Web Server again, and whether information
from a form on a web page should be sent to that Web Server.

So in P3P, the Browser, at the very beginning, exposes itself to a
minimum of two invasions of privacy.  The first is the first request to
a Web Server page.  The second is the request to the PolicyRef page
specified in the first response by the Web Server.   In theory, the
second such request is supposed to be in a "safe zone".  A safe zone is
simply a voluntary agreement by the Web Server not to record anything
significant from the Browser that is making the request.   Furthermore,
if a Browser wants to be safe about the first request, it can issue a
"HEAD" request that simply returns the Message Header from the site that
contains the policy reference.  This HEAD response is supposed, also, to
be an action for a safe zone.  Because the Web depends on client
Browsers making first contact with servers, it is not clear how to avoid
this potential attack on privacy by Web Servers that choose not to have
these recommended safe zones.  However, we can attend to this problem in
a different setting at the end of this article.

HTTP defines a communication from a Browser to a Server and from a
Server to a Browser.  These communications each have three parts.  The
Browser-to-Server request in essence (1) asks the Web Server to do
something, (2) explains how it wants it done, and then (3) provides
additional data to do it with.   The explanation part is the "HTTP
Message Header" information.  When a Web Server talks back to the
Browser, it  (1) tells the Browser if it did what was asked, (2)
explains how it is doing it, and (3) provides the data that does it. 
Again, the second part is the "HTTP Message Header" information. The
response Message Header provides the first P3P information, the web
address of the PolicyRef page.  

Notice there is no P3P information in the communication from the Browser
to the Server.  P3P is a completely one-sided service.  The Server tells
the Browser its Privacy Policies, and the Browser is now on its own.  
It tells the Browser its Privacy Policies by giving the Browser the
PolicyRef page to go and fetch.  The Browser can choose to do this, but
it remains on it's own.

The Browser fetches the PolicyRef page to decide what to do.  Here the
P3P information is in the content of the page itself, and it is encoded
in an elaborate XML language as well as, possibly, an HTML presentation
for the benefit of a human being who wants to read the privacy policy.  
The Browser, or a program in the Browser called the "User Agent",
decides unilaterally whether to accept the privacy policy presented to
it.  

This policy can say many things.  It can isolate things like name and
address and stipulate that they will be used one way, perhaps solely to
authorize payment, while other things like email address might be used
for marketing follow up.   The categories of information that the web
site may deal with in different ways are specified in the following
list:

1.	<physical/>  Physical Contact Information
2.	<online/> Online Contact Information
3.	<uniqueid/> Unique Identifiers
4.	<purchase/> Purchase Information
5.	<financial/> Financial Information
6.	<computer/> Computer Information
7.	<navigation/> Navigation and Click-stream Data
8.	<interactive/> Interactive Data
9.	<demographic/> Demographic and Socioeconomic Data
10.	<content/> Content
11.	<state/> State Management Mechanisms
12.	<political/> Political Information
13.	<health/>     |Health Information
14.	<preference/> Preference Data
15.	<other/> Other

The above tags are not statements of the purpose for obtaining the
information.  They are simply referred to as "hints" about the purpose. 
Here is the list of purposes quoted exactly from the specification:

1.	"<current/>   Completion and Support of Current Activity: Information
may be used by the service provider to complete the activity for which
it was provided, such as the provision of information, communications,
or interactive services -- for example to return the results from a Web
search, to forward email, or place an order. 
2.	<admin/>  Web Site and System Administration: Information may be used
for the technical support of the Web site and its computer system. This
would include processing computer account information, and information
used in the course of securing and maintaining the site. 
3.	<develop/> Research and Development: Information may be used to
enhance, evaluate, or otherwise review the site, service, product, or
market. This does not include personal information used to tailor or
modify the content to the specific individual nor information used to
evaluate, target, profile or contact the individual. 
4.	<customization/> Affirmative Customization: Information may be used
to tailor or modify the content or design of the site only to
specifications affirmatively selected by the particular individual
during a single visit or multiple visits to the site. For example, a
financial site that lets users select several stocks whose current
prices are displayed whenever the user visits. 
5.	<tailoring/> One-time Tailoring: Information may be used to tailor or
modify content or design of the site not affirmatively selected by the
particular individual where the information is used only for a single
visit to the site and not used for any kind of future customization. For
example, an online store that suggests other items a visitor may wish to
purchase based on the items he has already placed in his shopping
basket. 
6.	<pseudonym/>  Pseudononymous Profiling: Information may be used to
create or build a record of a particular individual or computer that is
tied to a pseudononymous identifier, without tying
personally-identifiable information (such as name, address, phone
number, email address, or IP address) to the record. This profile will
be used to determine the habits, interests, or other characteristics of
individuals, but it will not be used to attempt to identify specific
individuals. 
7.	<profiling/>  Individual Profiling: Information may be used to create
or build a record on the particular individual or computer for the
purpose of compiling habits or personally identifiable information of
that individual or computer. For example, an online store that suggests
items a visitor may wish to purchase based on items he has purchased
during previous visits to the web site. 
8.	<contact/> Contacting Visitors for Marketing of Services or Products:
Information may be used to contact the individual for the promotion of a
product or service. This includes notifying visitors about updates to
the Web site. 
9.	<other-purpose> string </other-purpose> Other Uses: Information may
be used in other ways not captured by the above definitions. (A human
readable explanation should be provided in these instances)."

P3P clearly provide a way to stipulate the purpose to which the user's
information disclosure is put.  This is highly commendable.  Perhaps the
choice of particular purposes is not so good.

As one example of this in action, let us take the case of giving your
name, credit card, and address information for an order.  Basically the
site that wants you to feel safe can say that this information will be
used for it's current purpose as explained on the page you saw.  Yes. 
So, for example, if I print in very fine print at the bottom of the page
that my current purpose is to give your credit card number to the first
thief I can find, I have fulfilled my obligation.  I might even declare
that this information is of the type "purchase" but that is supposed to
only be a "hint" as to how it might be used.  If you happen to read the
fine print, you know what is going to happen to your credit card.  A
lawyer might argue otherwise, but the fact is that the only thing in
writing from the Web Server is that the purpose is stipulated to be
written on the page and the page says that the purpose of taking the
credit card information is to hand it to a thief (as well as, probably,
to make a payment, without ambiguity).  I might even create a "TrustUS"
symbol and put it at the top of my purchase page and on my privacy
policy page.

If you don't think companies will try to use ploys to get you to trust
them, read the IBM privacy policy on the IBM P3P Editor site:

"This Overall Privacy Statement verifies that IBM is a member of the
TRUSTe  program and is in compliance with TRUSTe privacy principles.
This statement discloses the privacy practices for the IBM Web
(ibm.com).  TRUSTe is an  independent, non-profit initiative whose
mission is to build users' trust and confidence in the Internet by
promoting the principles of disclosure and informed consent. Because
this site wants to demonstrate its commitment to your privacy,  it has
agreed to disclose its information practices and have its privacy
practices reviewed and audited for compliance by TRUSTe. When you visit
a Web site displaying the TRUSTe mark, you can expect to be notified of: 
        What information is gathered/tracked 
        How the information is used 
        Who information is shared with 

Questions regarding this statement should be directed to
askibm@vnet.ibm.com or TRUSTe for clarification. 

We know that you are concerned about your privacy; so is IBM. If you
provide IBM with information about yourself, such as name, postal
address, e-mail address, or other personal data, we may add it to our
records. From time to time you may receive information about our
products, services, activities, or contacts for other business purposes,
unless you request otherwise by selecting the appropriate button on the
data collection page.

IBM is a global organization with legal entities operating components of
our Web site worldwide. Because of the global scope of our Web, we may
transfer your  personal information to countries of the world which
provide various levels of legal protection.  Please realize that when
you give us personal information, IBM will handle it in the manner we
describe here. To learn more, you can read about IBM's general Internet
privacy practices. Our privacy practices are designed to provide a high
level of protection for your personal data, all over the world. 

This Web site is maintained by the International Business Machines
Corporation.

You can reach us by telephone by calling +1-416-383-9224; within North
America you can reach us at 1 800-426-7777. You can also send us a
message at askibm@vnet.ibm.com.

Please use the Back button on your Browser to return to the page where
you
were. "

Yes, they said they were going to disclose your personal information to
countries of the world that provide "various levels of legal
protection." (Did you get that far?) But don't worry, "when you give us
personal information, IBM will handle it in the manner we describe
here."  Note, there is nothing in P3P that provides an automatically
confirmation that your personal information will escape the laws of the
United States.

It might be better to have very concrete, in addition to very abstract,
purposes, and let people know these concrete ones are possible.  So, for
example, in addition to the <current/> tag that just caused some heart
burn, we might have (these are made up and not in the specification):
1.	<payment/> The binding purpose is to obtain payment for the order.
2.	<delivery/> The binding purpose is to deliver the order to the
address.
3.	<web_search/> The binding purpose is to perform the current web
search.
4.	<export/> The binding purpose is to export the data to the authority
of another country.
And so forth. We have hundreds of "HTTP types" (the typing of data legal
in HTTP data messages), it would seem we could have hundreds of very
specific purposes.  For people who know about the science of human
intentionality, it makes sense to be able to list many specific
purposes.

Before thinking that P3P is just not worth anything, it needs to be
recognized that the writers of the 1.0 working draft specification are
openly soliciting comments, have disclosed this specification, and have
created a specification that covers all the bases that need to be
covered in a basic privacy specification.  

Not only do they allow the use of different kinds of information to be
different, they understand that a purpose or intent is actually a simple
thing to state and evaluate.  They also provide explicit tags for many
other contingencies such as tags that tell who the ultimate recipients
of the data will be, and tags that tell the user what penalty the web
site is willing to pay for misusing the data!  These are all very good
things to have for automated negotiation.  They are laid out in a
fashion that makes machine interpretation possible, and, in fact,
reasonable.

The writers also explicitly say that P3P 1.0 lacks the following
desirable characteristics:

	"a mechanism to allow sites to offer a choice of P3P policies to
visitors 
	a mechanism to allow visitors (through their user agents) to
explicitly agree to a P3P policy 
	mechanisms to allow for non-repudiation of agreements between visitors
and web sites 
	a mechanism to allow user agents to transfer user data to services"

In effect, P3P 1.0 lacks the ability to negotiate with the Web Server on
a contract, and to make a contract with the Web Server that could be
legally binding.  All of this is fundamentally because the Web Server
simply provides an ultimatum to the Browser.  Recalling the 1960's "love
it or leave it," perhaps the Browser can leave the country if he doesn't
want to live there, but he can't talk back.

P3P 1.0 is likely, I think, to create some unpleasant behavior for
users.  The user is simply warned that this web site is going to use his
information for marketing purposes and will report the data to a third
party.  But, let's say this is his stockbroker.  What does he do then? 
Call the Chief Counsel on the telephone to negotiate a better deal? 
This unpleasant behavior may be as damaging to the P3P effort as
anything else.  It seems certain that the working group wanted to
introduce P3P in steps, but this might harm acceptance if the steps are
the wrong ones.

A sane mechanism would be for his Browser to start negotiating with the
Web Server to tell it what he is willing to do.  The server can then
decide whether it wants this person's business.  Yes, it is true that
the protection of privacy would now become a point of competitive
advantage for companies.   This willingness to protect privacy to gain
business has to be balanced against their desire to grab as much
information as they can get.

The P3P group clearly understands that such negotiation is going to be
important in future versions of the specification.  In fact, there is an
affiliated group called APPEL (A P3P Preference Exchange Language, see
the P3P site) that has proposed a rule-based reasoning system for
privacy that is meant to go hand-in-hand with P3P.  However, there is
still not a mechanism in P3P, or APPEL, for the Browser to talk back to
the Server about privacy, so this rule system has only limited utility
at present.

It seems pretty obvious that P3P needs a means to establish a negotiated
contract that goes beyond W3's APPEL.    But it also needs a means in
the law for policing the privacy contracts  

Chances are that the P3P group is pretty skittish about suggesting that
the law get involved in any of this stuff.  However, as an old mentor of
mine once said, "The trick is to understand when the technology ends and
the law begins."  

The mechanism of non-repudiation mentioned as a future task for P3P
provides a "signed" contract between the user and the server.   The
agreement cannot be repudiated as not having happened.  This makes the
contract, a contract.  However, there is almost too much information in
this contract.  Ideally all you want to know is that the Web Server has
used the information illegally.  You should not have to disclose all the
users that visited the Sex site and gave up their credit card
information.  It would seem to me that along with non-repudiation, you
want to have anonymous users.  This cannot happen in a non-mediated
transaction as has been proposed for the direct Browser-to-Server P3P
interactions.  Exactly how to create this scenario around data that
contains names and addresses is going to be a technically interesting
challenge but one idea is will be worth considering in a moment.

The last of the serious criticisms is that P3P fails to provide a means
for transitivity and universality of the protection of information. 
This is actually several things. 

The transitivity problem is how to protect your privacy after the
information is handed to somebody else.  

If a violation of privacy is generally a misuse of information about you
or information that you provide (e.g., a trade secret, a confidential
comment to a webmaster), then there must be a way in the privacy
protocol to indicate that a privacy directive is essentially
non-negotiable, or negotiable only back to the original owner, and this
needs to be passed on to the next possessor of the information.  

Accomplishing this would be technically fairly simple unless the
information changes and becomes derivative information.  If something is
learned and a conclusion is drawn, is the information that caused the
learning binding its privacy directives on the conclusion?  This is a
hard problem, not an easy one.  

One solution is to create directives on derivative information. 
Essentially the directive says that the purpose of the Web Server is to
record information so that conclusions or derivative observations can be
obtained.  This information now becomes the property of the owner of the
Web Server.  

Conversely, a User may say to the Web Server that it can use the
information to clear a credit card or to give the user a registration
account, but that any derivative information must be restricted to just
this.  A Web Server taking this information could not pass it on or
provide it as the basis for developing new knowledge.  

These two cases, actually, are potentially handled with existing tags in
P3P.  What is not is the case where the requirement is that the
information can be passed along but that the new owner must preserve the
privacy conditions.  There is no mechanism in P3P for preserving the
integrity of the use to which the information can be put.  A
particularly useful case of this might be the case where your personal
information can be passed along but only non-identifiable summaries used
for marketing purposes.

The universality of the protection of information is yet another problem
with the current specification.  The way P3P is set up, the user has to
set up each Browser or "User Agent" that he uses.  If he works on ten
machines at the office and at home he has to take care to make sure all
the machines utilize the same privacy policies, or he might as well have
no privacy policies at all.  

It should be possible to have a location on the Internet where you have
your name-and-address information.  Exclusively you can invoke this
bundle of information, and it has a shell of a hard privacy policy
surrounding it.  Thus, you can insert your name-and-address in any web
Browser or in any email message and the privacy policy is negotiated
with the recipient.  Furthermore, because there is a "third party" proxy
for the bundle, it is readily possible to create anonymous
transactions.  This general technique of having proxy sites for you
could solve the problem mentioned at the beginning of the paper with
having to trust the Web Server on the first two of your hits to its
site.  

Accomplishing a system such as this does not strike me as much more
complex than the existing Domain Name Service that works well on the
Internet.  P3P, in this view, makes the conceptual error of thinking
that privacy is intransitive, and that it is not necessary to describe
information being sent by the user with a privacy policy.  Just as the
Web Server privacy policy has a unique web address (the PolicyRef page),
the user could have a privacy policy with a unique web address (his
PolicyRef) and the two could negotiate and transfer data in a uniform
and universal fashion.  The system could be engineered, I think, to even
handle the problems with having to remember a lot of passwords as well
as the privacy problem of providing a universal acceptance criterion for
the user's name and address.

Another final criticism of P3P is that it is a web-only solution to
privacy when we know that the Internet involves much more than simply
the web.  For example, email moves by other protocols (mainly SMTP), not
HTTP.  There is no way in P3P to say that the mail you send to a company
contains information that should only be used in ways that you
restrict.  Most lawyers would not want to tell you about how little
legal sting is in their messages to treat an email as confidential.. 
They would have to show that you agreed to treat the email as
confidential.  It is easy to argue that you "accidentally read" the mail
before seeing the "confidentiality statement."  Without negotiated
agreement, P3P is completely ill suited to mail routing.  However, a
strong case can be made, I think, that P3P describes precise and
essential building blocks for a solution to privacy in mail and many
other Internet protocols.   P3P should not be thrown away; it should be
built upon.  But perhaps one place that should support P3P would be a
configuration of non-HTTP servers that serve up information packages,
such as credit card purchases, names and addresses, and, perhaps, even
passwords.  This would be an effort for the IETF, I think, because it
would act like the Domain Name System in resolving information requests
for all other communications protocols on the Internet.

I have no doubt that many members of the W3 P3P working group have
thought through many if not most of the concerns expressed in this
article.  However, these are not the people who are likely to talk about
such concerns since their main interest is in getting P3P accepted.  
Pointing out flaws like the ones pointed out above don't, on the
surface, look like help in getting P3P accepted.  But my argument is the
opposite.   It is probably better for a third party to speak out on
these and to invite more vigorous public discussion.  This is precisely
because P3P takes us in the right direction.  It deserves to be
supported and added to.  P3P clearly represents a good start.   People
in all aspects of the Internet socio-economic-political system need to
sit up and think this through for themselves.  Privacy will have a
widespread and deep influence on the economic vitality of cyberspace. 
Information is power, and privacy management is the control, and thereby
the economic unleashing, of that power.
Received on Wednesday, 23 August 2000 08:59:58 UTC