- From: Lorrie Cranor <lorrie@research.att.com>
- Date: Wed, 14 May 2003 22:20:12 -0400
- To: public-p3p@w3.org
We released the following report today: "An Analysis of P3P Deployment on Commercial, Government, and Children’s Web Sites as of May 2003" You can download it from: http://www.research.att.com/projects/p3p/p3p-census-may03.pdf Here is the executive summary: The Platform for Privacy Preferences (P3P) provides a standard computer-readable format for privacy policies and a protocol that enables web browsers to read and process these policies automatically. We developed software to query a set of web sites for P3P policies, check the validity of each policy, and analyze the information practices it describes. We used this software to analyze 538 P3P-enabled web sites found by checking for P3P policies on 5,856 web sites on 6 May 2003. The sites we checked for P3P policies were taken from several lists of popular web sites, as well as from “crawling” indexes of shopping, news, children’s and government web sites. We present the first major analysis of the data practices of P3P-enabled web sites. Our system used the P3P evaluation engine built into the AT&T Privacy Bird P3P user agent to analyze the P3P policies we discovered. We checked these policies against Privacy Bird’s standard “high,” “medium,” and “low” settings as well as against 62 other “rule sets” that we developed. A comparison of our results with previous studies indicates that P3P adoption is increasing over time [1][14]. Adoption remains highest for the most popular web sites. We found a large number of errors in the P3P policies of the sites we evaluated. About one third of the P3P-enabled sites had technical errors. In many cases these errors were due to use of syntax from a draft version of the P3P specification that is not permitted by the final P3P 1.0 Recommendation [6]. However, 7% of the P3P-enabled sites had critical errors that prevented their evaluation by our Privacy Bird evaluation engine. We also found 74 sites that violated the P3P specification by posting P3P compact policies without their corresponding full P3P policies. Our analysis of data collection at P3P-enabled web sites indicates that most sites collect computer information, click stream information, and demographic data. Almost as many sites also collect online contact information, physical contact information, interactive data, and unique identifiers. The majority of sites also collect preference information, purchase information, and state management information (cookies). However, fewer sites collect financial information (which excludes information such as credit card numbers used only as part of a purchase). The least collected information is content (email messages, bulletin board postings, etc.), government-issued identifiers, health information, political information, location information (for example GPS positioning data), and information not falling into any of the pre-defined categories. Almost all web sites reported using data for completion and support of the activity for which data was provided, web site and system administration, and research and development. The majority of sites also reported using data for email and postal mail marketing, one-time tailoring of the site content, and pseudonymous profiling. Substantially fewer sites reported using data for telemarketing or profiling in which individuals are identified. Very few sites reported using data for historical preservation or other purposes that do not fall into these categories. About half the web sites we studied indicated that they share personally identifiable data with parties other than agents who use data for the purpose for which it was provided. 46% of these sites indicate that they offer third-party choice (opt-in or opt-out to this sharing). Most web sites reported providing some access provisions for individuals wishing to find out what data of theirs was in a web site’s records as well as some dispute resolution option for disputes related to their privacy policy. However, most sites reported that they did not have a data retention policy covering all of the data they collected at their site. As debates continue about the need for further privacy legislation and the effectiveness of industry self-regulation in the privacy area, automated analysis of the data practices of P3P-enabled web sites can provide valuable information. Furthermore, as US government web sites begin posting P3P policies to comply with the privacy requirements of section 208 of the E-Government Act of 2002 [16], we can monitor compliance with these requirements. We plan to repeat our experiments on a regular basis to allow for longitudinal analysis of P3P policies. In the future we may also expand the list of web sites we analyze, develop additional rule sets to facilitate more detailed analysis, and expand our analysis to include P3P compact policies.
Received on Wednesday, 14 May 2003 22:19:03 UTC