- From: Robert Thibadeau <rht@cs.cmu.edu>
- Date: Tue, 22 Aug 2000 21:06:39 -0400
- To: rr@cs.cmu.edu, shamos@cs.cmu.edu, jle@cs.cmu.edu, Chip Gierhart <Chip_Gierhart@Phoenix.com>
- CC: john_bourgein@Phoenix.com
- Message-ID: <39A3239F.84A08314@cs.cmu.edu>
Here is my brief critique on Privacy and www.w3.org/p3p. A Serious Critique of P3P : Privacy on the Web Robert Thibadeau, Ph.D. School of Computer Science Carnegie Mellon University Pittsburgh PA The World Wide Web (W3) consortium that brought the Hypertext Transfer Protocol, HTTP, that allows Browsers to talk to Web Servers, and the Hypertext Markup Language, HTML, that lets Browsers show what they hear from the Web Servers, has recently brought a lot more. The eXtensible Markup Language, XML, provided a framework for content communication between Browsers and Web Servers. It is natural that Privacy is something that the Browsers and Web Servers need to talk about to each other. It's called the Platform for Privacy Preferences or P3P. It is defined authoritatively by the W3 at http://www.w3.org/p3p. This is a serious and excellent effort by the W3 to provide the mechanisms for insuring privacy on the Internet. Huge amounts of information are being collected by many thousands of web sites today. While an effective technology, called SSL (Secure Sockets Layer) exists for protecting the privacy of the monetary transaction between a Browser and a Web Server, there is no protection once the information is on the Server and in the hands of the company or organization that lured you to it. Because P3P is such an outstanding work, it deserves the most serious of critiques. It is essential to know what it does, and what it does not do. For a period of time, P3P will be a work in progress. There is opportunity to hone the edge on this knife so beautifully made. First, an in-depth overview of P3P is in order. Then, my critique will cover all facets of the platform, from assumptions to implementation. The argument is that P3P is dangerously myopic and needs to be enhanced. Four areas of critical need are (a) an IETF (Internet Engineering Task Force) definition that doesn't require the web (specifically, the HTTP protocol), (b) a means for transitivity and universality of the protection on information, (c) a means to establish a negotiated contract that goes beyond W3's APPEL (A P3P Preference Exchange Language), and (d) a means in the law for policing the contracts obtained. It is irrelevant, in this paper, whether the W3 technical committee or the U.S. Congress should address problems with P3P. As many people as possible should deeply understand the Internet privacy debate. This is every individual's and every organization's privacy that is being negotiated through this debate. P3P works as a series of HTTP communications initiated by a Browser request to a Web Server. In this communication, the Browser says nothing about privacy. Then the Web Server responds to the Browser with whatever the Browser asked for, plus a reference to a privacy policy reference page. The Browser or person operating it, can now determine what do with the Web Server's response based on another HTTP request, now for the privacy policy reference page. The Browser reads the PolicyRef page and decides what to do. The PolicyRef page is in the language of XML and has many very definite things it can say. Privacy policy reference pages are very special and can be used to determine whether the Browser should ever come back to that Web Server again and whether information from a Form should be sent to that Web Server. In P3P, the Browser, at the very beginning, exposes itself to a minimum of two invasions of privacy. The first is the first request to a web server page. The second is to the PolicyRef page. In theory the second such request is supposed to provide a "safe zone". A safe zone is simply a voluntary agreement by the web server not to record anything from the browser that is making the request. Furthermore, if a browser wants to be safe about the first request, it can issue a "HEAD" request that simply returns the Message Header from the site that contains the policy reference. This HEAD request-response is supposed to be made safe. HTTP defines a communication from a Browser to a Server and from a Server to a Browser. These communication both have three parts. For the Browser-to-Server, in essence asking the Web Server to do something, explaining how it wants it done, and then data to do it with. The second part is called the HTTP Message Header information. Similarly, when a Web Server talks back to the Browser, it has three parts, in essence telling the Browser if it did what was asked, explaining how it is doing it, and the data that does it. Again, the second part is the HTTP Message Header information. This is where the first P3P information, the PolicyRef page location, is given. Notice there is no P3P information in the communication from the Browser to the Server. P3P is a completely one-sided service. The Server tells the Browser it's Privacy Policies and the browser is now on its own. It tells the Browser it's privacy policies by giving the browser the PolicyRef page to go and fetch. The Browser can choose to do this, but it remains on it's own. The browser fetches the PolicyRef page to decide what to do. Here the P3P information is in the content of the page itself, and it is encoded in an elaborate XML language as well as, possibly, an HTML presentation for the benefit of a human being who wants to read the privacy policy. The browser (or a program in the browser called the "user agent") decides unilaterally whether to accept the privacy policy presented to it. This policy can say many things. It can isolate things like name and address and stipulate that they will be used one way, perhaps solely to authorize payment, while other things like email address might be used for marketing follow up. The categories of information that the web site may deal with are specified in the following list: 1. <physical/> Physical Contact Information 2. <online/> Online Contact Information 3. <uniqueid/> Unique Identifiers 4. <purchase/> Purchase Information 5. <financial/> Financial Information 6. <computer/> Computer Information 7. <navigation/> Navigation and Click-stream Data 8. <interactive/> Interactive Data 9. <demographic/> Demographic and Socioeconomic Data 10. <content/> Content 11. <state/> State Management Mechanisms 12. <political/> Political Information 13. <health/> |Health Information 14. <preference/> Preference Data 15. <other/> Other The policy will state the purpose of the information that is being obtained. Here is the list of purposes (with exact quotes from the specification): 1. <current/> Completion and Support of Current Activity: Information may be used by the service provider to complete the activity for which it was provided, such as the provision of information, communications, or interactive services -- for example to return the results from a Web search, to forward email, or place an order. 2. <admin/> Web Site and System Administration: Information may be used for the technical support of the Web site and its computer system. This would include processing computer account information, and information used in the course of securing and maintaining the site. 3. <develop/> Research and Development: Information may be used to enhance, evaluate, or otherwise review the site, service, product, or market. This does not include personal information used to tailor or modify the content to the specific individual nor information used to evaluate, target, profile or contact the individual. 4. <customization/> Affirmative Customization: Information may be used to tailor or modify the content or design of the site only to specifications affirmatively selected by the particular individual during a single visit or multiple visits to the site. For example, a financial site that lets users select several stocks whose current prices are displayed whenever the user visits. 5. <tailoring/> One-time Tailoring: Information may be used to tailor or modify content or design of the site not affirmatively selected by the particular individual where the information is used only for a single visit to the site and not used for any kind of future customization. For example, an online store that suggests other items a visitor may wish to purchase based on the items he has already placed in his shopping basket. 6. <pseudonym/> Pseudononymous Profiling: Information may be used to create or build a record of a particular individual or computer that is tied to a pseudononymous identifier, without tying personally-identifiable information (such as name, address, phone number, email address, or IP address) to the record. This profile will be used to determine the habits, interests, or other characteristics of individuals, but it will not be used to attempt to identify specific individuals. 7. <profiling/> Individual Profiling: Information may be used to create or build a record on the particular individual or computer for the purpose of compiling habits or personally identifiable information of that individual or computer. For example, an online store that suggests items a visitor may wish to purchase based on items he has purchased during previous visits to the web site. 10. <contact/> Contacting Visitors for Marketing of Services or Products: Information may be used to contact the individual for the promotion of a product or service. This includes notifying visitors about updates to the Web site. 11. <other-purpose> string </other-purpose> Other Uses: Information may be used in other ways not captured by the above definitions. (A human readable explanation should be provided in these instances). P3P clearly provide a way to stipulate the purpose to which the user's information disclosure is put. This is highly commendable. Perhaps the choice of particular purposes is not so good. As one example of the potential problem in action, let us take the case of giving your name, credit card, and address information for an order. Basically the site that wants you to feel safe can say that this information will be used for it's current purpose as explained on the page you saw. Yes. So, for example, if I print in very fine print at the bottom of the page that my current purpose is to give your credit card number to the first thief I can find, I have fulfilled my obligation. If you happen to read the fine print, you know what is going to happen to your credit card. A lawyer might argue otherwise, but the fact is that the only thing in writing from the web server is that the purpose is stipulated to be written on the page and the page says that the purpose of taking the credit card information is to hand it to a thief (as well as, probably, to make a payment). The "TrustMe" symbol is displayed prominently as make a correct use of P3P. It might be better to have very concrete, in addition to very abstract, purposes, and let people know these concrete ones are possible. So, for example, in addition to the <current/> tag that just caused some heart burn, we might have: 1. <payment/> The purpose is to obtain payment for the order. 2. <delivery/> The purpose is to deliver the order to the address. 3. <web_search/> The purpose is to perform the current web search. And so forth. We have hundreds of "HTTP types" (the typing of data legal in HTTP data messages), it would seem we could have hundreds of very specific purposes for information. If you don't think companies will try to use ploys to get you to trust them, read the IBM privacy policy on the IBM P3P Editor site: "This Overall Privacy Statement verifies that IBM is a member of the TRUSTe program and is in compliance with TRUSTe privacy principles. This statement discloses the privacy practices for the IBM Web (ibm.com). TRUSTe is an independent, non-profit initiative whose mission is to build users' trust and confidence in the Internet by promoting the principles of disclosure and informed consent. Because this site wants to demonstrate its commitment to your privacy, it has agreed to disclose its information practices and have its privacy practices reviewed and audited for compliance by TRUSTe. When you visit a Web site displaying the TRUSTe mark, you can expect to be notified of: What information is gathered/tracked How the information is used Who information is shared with Questions regarding this statement should be directed to askibm@vnet.ibm.com or TRUSTe for clarification. We know that you are concerned about your privacy; so is IBM. If you provide IBM with information about yourself, such as name, postal address, e-mail address, or other personal data, we may add it to our records. From time to time you may receive information about our products, services, activities, or contacts for other business purposes, unless you request otherwise by selecting the appropriate button on the data collection page. IBM is a global organization with legal entities operating components of our Web site worldwide. Because of the global scope of our Web, we may transfer your personal information to countries of the world which provide various levels of legal protection. Please realize that when you give us personal information, IBM will handle it in the manner we describe here. To learn more, you can read about IBM's general Internet privacy practices. Our privacy practices are designed to provide a high level of protection for your personal data, all over the world. This Web site is maintained by the International Business Machines Corporation. You can reach us by telephone by calling +1-416-383-9224; within North America you can reach us at 1 800-426-7777. You can also send us a message at askibm@vnet.ibm.com. Please use the Back button on your browser to return to the page where you were. " Yes, they said they were going to disclose your personal information to countries of the world that provide "various levels of legal protection." (Did you get that far?) But don't worry, "when you give us personal information, IBM will handle it in the manner we describe here." Note, there is nothing in P3P that provides an automatically confirmation that your personal information will escape the laws of the United States (and how about Europe?) Before thinking that P3P is just not worth anything, it needs to be recognized that the writers of the 1.0 working draft specification are openly soliciting comments, have disclosed this specification, and have created a specification that covers all the bases that need to be covered in a basic privacy specification. Not only do they allow the use of different kinds of information to be different, they understand that a purpose or intent is actually a simple thing to state and evaluate. They also provide explicit tags for many other contingencies such as tags that tell who the ultimate recipients of the data will be, tags that even tell the user what penalty the web site is willing to pay for misusing the data! These are all very good things and they are laid out in a fashion that makes machine interpretation possible, and, in fact, reasonable. The writers also explicitly say that P3P 1.0 lacks the following desirable characteristics: "a mechanism to allow sites to offer a choice of P3P policies to visitors a mechanism to allow visitors (through their user agents) to explicitly agree to a P3P policy mechanisms to allow for non-repudiation of agreements between visitors and web sites a mechanism to allow user agents to transfer user data to services" In effect, P3P 1.0 lacks the ability to negotiate with the web server on a contract and to make a contract with the web server that could be legally binding. All of this is fundamentally because in 1.0 the web server simply provides an ultimatum to the browser. The browser can leave the country if he doesn't want to live there, but it can't talk back. P3P 1.0 is likely therefore to create some unpleasant behavior for users. The user is simply warned that this web site is going to use his information for marketing purposes and will report the data to a third party. But, let's say this is his stockbroker. What does he do then? Call the Webmaster? (Get serious.) A better mechanism would be for his Browser to start negotiating with the web server to tell it what he is willing to do. The server can then decide whether it wants his business. The P3P group clearly understands that such negotiation is going to be important in future versions of the specification. In fact, there is an affiliated group called APPEL (A P3P Preference Exchange Language, see the P3P site) that has proposed a rule-based reasoning system for privacy meant to go hand-in-hand with P3P. However, there is still not a mechanism in P3P or APPEL for the Browser to talk back to the Server about privacy, so this rule system has only limited utility at present. It seems pretty obvious that P3P needs a means to establish a negotiated contract that goes beyond W3's APPEL, but it also needs a means in the law for policing the contracts obtained. Chances are that the P3P group is pretty skittish about suggesting that the law get involved in any of this stuff. However, as an old mentor of mine once said, "The trick is to understand where the technology ends and the law begins." The mechanism of non-repudiation provides a "signed" contract between the user and the server. The agreement cannot be repudiated as not having happened. However, there is almost too much information in this contract. Ideally all you want to know is that the Web Server has used the information in a way it promised not to. You should not have to disclose all the users that visited the Sex site and gave up their information. It would seem to me that along with non-repudiation you want to have anonymous users. This cannot happen in a non-mediated transaction as has been proposed for the direct Browser to Server P3P interactions. Exactly how to create this scenario around data that contains names and addresses is going to be a technically interesting challenge. Another criticism of P3P is that it is a web-only solution to privacy when we know that the Internet involves much more than simply the web. For example, email moves by other protocols (mainly SMTP), not HTTP. There is no way in P3P to say that the mail you send to a company contains information that should only be used in ways that you restrict. Most lawyers would not want to tell you about how little weight their messages to treat an email as confidential actually carries. They have to show that you agreed to treat the email as confidential. It is easy to argue that you "accidently read" the mail before seeing the "confidentiality statement." Without negotiated agreement, P3P is completely ill-suited to mail routing. However, a strong case can be made, I think, that P3P describes precise and essential building blocks for a solution to privacy in mail and many other Internet protocols. P3P should not be thrown away, it should be built upon. The last of the serious criticisms is that P3P fails to provide a means for transitivity and universality of the protection of information. This is actually several things. The transitivity problem is how to protect your privacy after the information is handed to somebody else. If a violation of privacy is generally a misuse of information about you or information that you provide (e.g., a trade secret), then there must be a way in the privacy protocol to indicate that a privacy directive is essentially non-negotiable, or negotiable only back to the original owner, and this needs to be passed on to the next possessor of the information. Accomplishing this would be fairly simple unless the information changes and becomes derivative information. If something is learned and a conclusion is drawn, is the information that caused the learning binding its privacy directives on the conclusion. This is a big problem, not a small one. The solution is to create directives on derivative information. Essentially the directive says that the purpose of the Web Server is to record information so that conclusions or derivative observations can be obtained and that this information now becomes the property of the owner of the Web Server. Conversely, a User may say to the Web server that it can use the information to clear a credit card or to give the user a registration account, but that any derivative information must be restricted to just this. A Web Server taking this information could not pass it on or provide it as the basis for developing new knowledge. These two cases, actually, are handled in P3P. What is not is the case where the requirement is that the information can be passed along but that the privacy conditions must be preserved by the new owner. There is no mechanism in P3P for preserving the integrity of the use to which the information can be put. A particularly useful case of this might be the case where your personal information can be passed along but only non-identifiable summaries used for marketing purposes. The universality of the protection of information is yet another problem with the current specification. The way P3P is set up, the user has to set up each Browser or "User Agent" that he uses. If he works on ten machines at the office and at home he has to take care to make sure all the machines utilize the same privacy policies (or he might as well have no privacy policies at all!) It should be possible to have a location on the Internet where you have your name-and-address information. This bundle of information can be invoked exclusively by you, and it has a shell of a privacy policy surrounding it. Thus, you can insert your name-and-address in any web browser or in any email message and the privacy policy is negotiated with the recipient. Accomplishing a system such as this does not strike me as much more complex than the existing Domain Name Service that works well on the Internet. P3P, in this view, makes the conceptual error of thinking that privacy is intransitive and that it is not necessary to describe information being sent by the user with a unique ID (like the URI for the PolicyRef statement on the server) for its privacy policy. I have no doubt that many members of the W3 P3P working group have thought through many if not most of the concerns expressed in this article. However, these are not the people who are likely to talk about such concerns since their main interest is in getting P3P accepted. Pointing out flaws like the ones pointed out above don't, on the surface, look like help in getting P3P accepted. But my argument is the opposite. It is probably better for a third party to speak out on these and to invite more vigorous public discussion. This is precisely because P3P takes us in the right direction. It deserves to be supported and added to since it so clearly represents a good start. People in all aspects of the Internet socio-economic-political system need to sit up and think this through for themselves. Information is power, and privacy management is the control of that power.
Received on Tuesday, 22 August 2000 21:07:31 UTC