W3C home > Mailing lists > Public > www-privacy-evaluator@w3.org > November 1998

Version 1.4 of Privacy Evaluator documentation

From: Rolf H. Nelson <rnelson@tux.w3.org>
Date: Sun, 1 Nov 1998 14:48:30 -0500
Message-Id: <199811011948.OAA14045@tux.w3.org>
To: www-privacy-evaluator@w3.org


                             Privacy Evaluator
       Rolf Nelson (W3C) <[2]rnelson@w3.org>
Status of this Document:

   This document may end up being submitted as a W3C NOTE.  This
   would then be a NOTE made available by W3C for discussion
   only. This
   indicates no endorsement of its content, nor that W3C has, is, or
   be allocating any resources to the issues addressed by the NOTE.
   comments to [3]www-privacy-evaluator@w3.org.  This list is publicly
   archived at

   Some users are unaware that personal data that they send to Web
   is sometimes redistributed without their knowledge or explicit
   permission.  Negative consequences of this redistribution can range
   from the subsequent reception of unwanted junk mail to the
   of identity theft.  To inform the user of what the Web site will do
   with the data it requests, Web sites can post privacy disclosures
   describe what the Web site will do with the data it collects.
   disclosures can take the form of human-readable natural language
   explanations; alternatively, new technologies like P3P [[5]P3P]
   allow machine-readable privacy disclosures.  Unfortunately, some
   sites have no privacy policies posted whatsoever. [[6]FTC]
   A "privacy critic" [[7]Critic] utility that can warn users of some
   possible consequences of sending personal data to a Web site is a
   valuable tool.  Such a utility could be designed in many different
   ways.  This document describes one possible design, called Privacy
   Evaluator.  A defining feature of Privacy Evaluator is its use of
   preset heuristics, or "rules of thumb," to determine if a user is
   the process of submitting personal data through an HTML form.  This
   document also describes one existing prototype implementation of
   Privacy Evaluator. This prototype implementation, called PJPS, is a
   proof-of-concept.  A polished implementation of Privacy Evaluator
   would be more robust and would have a more polished user interface
   than PJPS. Preliminary and unscientific tests show that PJPS can
   detect the transmission of personal data correctly for 28 of 29
   randomly chosen Web sites.

   Privacy Evaluator describes a specific class of Web user agents
   as Web browsers) that automatically provide the user with a certain
   style of privacy information.  P3P is a language Web sites can use
   disclose their privacy practices in a machine-readable way.
   Evaluator can warn the user about a site's privacy not only when
   P3P-compliant sites are accessed, but even when non-P3P-compliant
   sites are accessed.  PJPS (Privacy Jigsaw Proxy Server) is the W3C
   [[8]W3C] prototype implementation of Privacy Evaluator.
   With Privacy Evaluator, when a user submits data through an HTML
   to a site, an alert may appear warning the user of some possible
   consequences of submitting personal data to an unprotected Web
   This alert will appear if the following two conditions are both
   1.  The Web page containing the HTML form does not have an adequate
   machine-readable privacy disclosure (such as a P3P disclosure) that
   would ensure the user's privacy.  PJPS would check that either the
   "id" field is "no", or that both the "recpnt" and "purp" fields are
   sufficiently low. [[9]P3P]
   2.  Privacy Evaluator believes that the data being submitted is
   "identifiable"; that is, it could be used to identify the user.
   would consider the data to be identifiable if the following two
   sub-conditions were both met:
       a.  The HTML form looks like it is soliciting the user's name
   electronic mail address.  One way to determine this is to see if an
   input field key substring matches "name" or "email".  Another is to
   see if the Web page contains the phrases "first name" and "last
   A third method is to see if the data the user entered looks like an
   email address.  These heuristics should match a majority of the
   English-language sites on the Web that capture personally
       b.  The name of the submit button does not look like the button
   for a search engine.  The way to determine this is to see if the
   submit value equals something other than "search" or "find".  If
   submit button is labeled "search" or "find", it is less likely that
   the form is soliciting personally identifiable information about
   user.  This heuristic makes it less likely that search engines will
   accidentally trigger a false alert.
   Producing this alert for sites without P3P that appear to be
   collecting identifiable data has two benefits.  First,
   users of Privacy Evaluator will get educated about the possible
   consequences of submitting personal data on the Web.  This will be
   especially helpful to non-American users in countries with strong
   protection norms who do not fully realize that they are visiting a
   site located in a different country that does not offer privacy
   protection.  Second, Web sites will have an additional incentive to
   use a machine-readable privacy disclosure language like P3P.  A Web
   site that uses P3P and has an adequate privacy policy would be more
   likely to convince a Privacy Evaluator user to submit data than
   a site that does not use P3P.  With Privacy Evaluator, a Web site
   never punished and is sometimes rewarded for using P3P.  This way,
   Web site is never worse off for having used P3P.
   The arbitrarily chosen goal is that most users who surf the Web
   Privacy Evaluator should have a "false negative" rate of under 20%
   a "false positive" rate of under 5%.  A false negative is when a
   site that does collect identifiable information mistakenly does not
   trigger an alert.  A false positive is when a Web site that does
   collect identifiable information mistakenly does trigger an alert.
   Privacy Evaluator is not designed to prevent malicious Web
   administrators from deliberately preventing the alert from
   These constraints should be loose enough that a working Privacy
   Evaluator implementation is easy to create, but tight enough that
   Privacy Evaluator is useful.  A Privacy Evaluator implementation
   should be tuned to the expected language of the Web sites that that
   user is likely to visit.  PJPS is designed to work well for
   English-language Web sites.
   Privacy Evaluator is designed to be privacy-friendly and
   non-intrusive. Existing browsers that do not use P3P are
   non-intrusive, but not privacy-friendly.  A hypothetical user agent
   that blocked every non-P3P site on the Web would be
   but would not be non-intrusive.  Privacy Evaluator is
   because the rate of false negatives is under 20%, and is
   because of the low rate of false positives.
Implementation Details:

   A Privacy Evaluator implementation can include a parser, a trust
   engine, a sniffer, and a user interface.  The trust engine has not
   been implemented in PJPS as of this writing.
   The parser module would need to look for a link in the HTML head to
   separate document containing a P3P disclosure.  It would then need
   follow this link, retrieve the P3P document, and parse it.  The
   would need to understand either XML, RDF, P3P, or a relevant subset
   P3P.  Conceivably the parser could be very crude and merely look
   the P3P <STATEMENT> tag.
   The trust engine, which consists of a set of privacy preference
   would take the parsed P3P disclosure and would return a boolean
   stating whether the privacy statement is strong enough to suppress
   P3P alert.  It produces this boolean by evaluating at least three
   fields: the "id" field, the "purp" field, and the "recpnt"
   field. One
   possible implementation would be a database listing every
   combination of these enumerated values.  A simpler possibility
   be to hardwire in that only the following proposals are acceptable:
     a.  proposals with "id" field equal to "no"; or
     b.  proposals with "purp" fields in the range 0 to 3 and "recpnt"
   fields in the range 0 to 1.  For example, a "recpnt" field equal to
   "0, 3" would be unacceptable to this trust engine.
   Alternatively, a very trusting trust engine could search the Web
   for the mere presence of a P3P proposal or a link to a privacy
   or even for a mention of the word "privacy" in any language
   in the HTML.
   The sniffer decides whether the information being transmitted looks
   identifiable.  It can use heuristics that analyze the data being
   transmitted.  For example, it can check whether one of the key
   has "name" or "address" as a substring.  Given the data being sent
   through CGI and the contents of the originating Web page, the
   returns a boolean stating whether it thinks identifiable
   is being sent.  If the sniffer decides that the data is
   Privacy Evaluator should invoke the user interface to bring up an
   The user interface's alert can consist of a dialogue containing a
   which is read from a configuration file.  This text can be a
   that no adequate machine-readable privacy disclosure was found, and
   that there may be no guarantee that personal data submitted to the
   site will not be sold to other parties.  The text may also suggest
   user look for a human-readable privacy disclosure.  This dialogue
   is similar in spirit to the warning issued by many browsers when
   sending data through an insecure channel that does not use HTTPS.
   user can elect to continue the transaction, or cancel.  Inside this
   dialogue a box can be checked if the user does not want to see this
   warning again.
   An alternative design decision would have been to produce an alert
   when a web page is downloaded rather than when the form is
   This would have had the disadvantage of bringing up alerts for web
   pages that the user has no desire to submit data to anyway.
   the decision was made to only alert the user about that minority of
   Web pages where the user has actually filled in the Web form and is
   the process of submitting data to.  If the user is not submitting
   data, then the privacy policy of the Web page is not as relevant.
   PJPS runs as a proxy server and therefore cannot directly produce
   alert dialogue on the user's computer in the way that a local
   application like a Web browser can.  PJPS could have been designed
   produce an alert using Java, but this would have required the
   Web browser to support Java.  PJPS instead embeds the alert
   in the HTML document returned by the proxy.  Here is an example
   transaction where the user begins to send data to a site, PJPS
   produces an alert, and the user elects to ignore the alert and
   sending data to the Web site.
   Browser sends to PJPS proxy:  GET /foo.cgi?bar=buz
   PJPS proxy sends back a privacy alert embedded in a form:
   <FORM ACTION="/foo.cgi">
   <INPUT TYPE="hidden" NAME="data" VALUE="/foo.cgi?bar=buz">
   <INPUT TYPE="submit" VALUE="go ahead anyway">
   User clicks "go ahead anyway" and browser sends to PJPS proxy:
   GET /foo.cgi?submit=go+ahead&data=%2Ffoo.cgi%3Fbar%3Dbuz
   Proxy then sends on to Web server: GET /foo.cgi?bar=buz and returns
   the fetched Web document to the user.
   With PJPS, if the user checks the box indicating not to show the
   dialogue again, a second dialogue may appear explaining that since
   this is a prototype, checking the box does not actually do
   In contrast, in a real non-prototype Privacy Evaluator
   checking the box would have disabled Privacy Evaluator
   By not implementing this check box, this proxy is saved from having
   keep state for each user.  Besides, PJPS would become very
   uninteresting after the box is checked.
   The dialogue should also have a help button, and ideally a link to
   explanation of why exactly this document triggered the alert.
   PJPS, is layered on top of the W3C Jigsaw [[10]Jigsaw] server and
   takes a form of a proxy server.  The alternative would have been to
   implement PJPS as a browser.  Implementation as a proxy server had
   advantages.  First, development of PJPS on top of Jigsaw proxy
   was fast and easy, partly because jigsaw already has an XML parser.
   Second, a proxy server is more accessible; if an interested
   wishes to see Privacy Evaluator in action, he or she would merely
   to configure his or her existing browser to use our PJPS proxy at
   p3p.w3.org.  If this person were instead required to download,
   install, and run a browser, that would create a serious obstacle.
   main disadvantages of this proxy approach are worse response time,
   less UI control, and a reduction in user information.  The
   of this proxy approach were judged to outweigh the disadvantages
   the purposes of the prototype.  A widely deployed and polished
   implementation of Privacy Evaluator would probably need to be
   implemented within the browser rather than as a proxy.
   Because PJPS runs as a proxy, it cannot directly access the HTML
   that the user submitted data from.  PJPS therefore relies on the
   "Referer" field to determine what HTML document produced the
   so that it can scan that document for "first name" and "last name."
   This has two disadvantages.  First, in theory, a single URL may map
   more than one document.  For example, posting two different sets of
   data to a single URL may yield two different return documents
   containing two different HTML forms.  Second, PJPS does not work
   correctly with browser configurations that do not emit the
   field.  As of this writing, both Netscape and Microsoft browsers
   the "Referer" field by default.  A more sophisticated alternative
   would have been to keep a database of the "action" fields contained
   Web pages.  For the sake of rapid development, PJPS lacks this
   sophisticated database.
   To speed development, several important aspects of P3P have been
   omitted in Privacy Evaluator.  HTTP support and the transmission of
   data solicited through P3P methods are elements that were deemed
   desirable but not necessary for Privacy Evaluator.  Privacy
   also lacks a sophisticated trust engine and a way of downloading
   customized privacy preferences over the Web.  These are important
   items, nevertheless they are not required for Privacy Evaluator.
   The implementation of PJPS will be considered a success if it meets
   the stated goals of false positives and false negatives, and does
   crash, during user tests.  User tests could consist of two randomly
   chosen individuals who could be asked to browse a series of Web
   and submit data to those pages.  The pages could be determined
   analyzing user trace data to find representative sites.  A tally
   manually be kept of false positives and false negatives.  In
   multiple people could use PJPS during the course of a week of
   Web browsing to verify there are no unexpected problems.  See the
   section on Implementation Status for information on some
   manual tests.
   The design of Privacy Evaluator will be considered a success if the
   following three criteria are met: the implementation of PJPS is a
   success as described above; Privacy Evaluator is useful; and
   Evaluator is usable.  Privacy Evaluator is useful if a significant
   percent of user agent distributors, including ISPs, make plans to
   deploy Privacy Evaluator or a variant of Privacy Evaluator, and if
   users of those implementations generally evaluate them as useful.
   Privacy Evaluator is sufficiently usable if user tests fail to
   any showstopper user interface problems.
Details of Current PJPS Heuristics:

   Below is the current process for using the PJPS heuristics for
   determining if an attempted data transmission through an HTML form
   carries personally identifiable information:
   1.     (Search Rule) Does the submit button have a value like
   or "search"?  If so, the transaction is NOT suspect.  If not, go to
   step 2.
    2.     (Key Rule) Does the CGI key in one of the INPUT element
   have as a substring "name" or "email"?  If so, the transaction is
   suspect.  If not, go to step 3.  See the HTML specification
   for the syntax of HTML element tags.
    3.     (Text Rule) Does the full text of the HTML document (not
   the tags, not just the form, but the entire HTML document) contain
   both the phrase "first name" AND the phrase "last name"?  If so,
   transaction is suspect.  If not, go to step 4.
   4.     (Value Rule) Does one of the values that the user typed in
   is submitting contain the character "@"?  If so, the user is
   submitting an email address and the transaction is suspect.  If
   the transaction is NOT suspect.
   The string comparisons in all of these steps must be
   Rule 3, the Text Rule, could also look for synonyms such as "given
   name" and "family name".
   These four heuristics do not exhaust the set of all possible useful
   heuristics.  Other possible useful heuristics that are not used by
   PJPS include a more refined email match, a postal address match, a
   search for registration synonyms, and support for languages other
   English.  A more refined email match, rather than looking for the
   simple presence of the "@" character, could do a pattern match on
   legal RFC822 [[12]RFC822] email addresses, and even try to look up
   domain name of the entered email address to check for validity.  A
   postal address match, for users in the United States, could look
   one of the two-letter state abbreviations.  A search through the
   page for registration synonyms would flag phrases like "user
   registration".  Support for non-English languages would involve
   developing separate heuristics for each language.
   If a transaction is suspect, Privacy Evaluator should produce a
   warning dialog alerting the user unless Privacy Evaluator has found
   adequate P3P disclosure protecting the privacy of the transaction.
   These heuristics are believed to satisfy the design goals of less
   5% false positives and less than 20% false negatives.  Tests could
   developed to verify or disprove this belief.
   Below are some examples of the heuristics in action.
   Suppose Web form A has the following tag:
   <INPUT TYPE=submit VALUE="Search">
   Transactions produced by form A would NOT be suspect because of
   1, the "Search Rule."
     Suppose Web form B includes the following tag:
   <INPUT NAME="Your_Name">
    Transactions produced by form B would be suspect because of Rule
   the "Key Rule."  (Unless, of course, Rule 1 about "search" and
   transactions not being suspect contradicted this.)
   Suppose Web page 1 includes the following text:
   Enter Your First Name:  <INPUT NAME="FN">
   Enter Your Last Name:   <INPUT NAME="LN">
   Transactions produced by page 1 would be suspect because of Rule 3,
   the "Text Rule." (Unless, of course, this contradicts Rule 1.)
   Suppose Web form C does not match any of the first three rules.
   Suppose further the user enters into one of the INPUT fields the
   "Joe@foo.com".  When the user clicks the submit button, the
   transaction should be flagged as suspect because of Rule 4, the
   Rule." (Unless, of course, this contradicts Rule 1.)
Interoperability with P3P:

   Privacy Evaluator implementations should interoperate with P3P
   implementations.  The simplest way to ensure this is to allow the
   trust engine functionality to manually be disabled when the user
   has a separate P3P utility running a more sophisticated trust
   A more complicated but more powerful solution is to feed the binary
   output of the Privacy Evaluator sniffer into a fully implemented
   trust engine.
Implementation Status:

   As of Oct 14, 1998, PJPS is up and running at p3p.w3.org:8080.  It
   not been exhaustively tested and is known to work only with POST
   not with GET CGI queries.  An unscientific test of the heuristics
   found that 8 out of 9 popular Web sites that collect personally
   identifiable information produce PJPS alerts.  20 out of 20
   chosen Web sites of only average popularity that collect personally
   identifiable information produce PJPS alerts.  This indicates a
   satisfyingly low rate of false positives.  No false negatives were
Mailing List:

   Public comments and discussion about Privacy Evaluator or about
   should go to www-privacy-evaluator@w3.org.  Instructions for
   subscribing are available:
   8Oct/0000.html> Archives of this list are at the following URL:
Future Work:

   The heuristics suggested in this document should be systematically
   tested to determine the rate of false positives and false
   Usability tests should be conducted to find the best way to
   communicate privacy information to users.
   PJPS does not work on .shtml, https, or GET CGI transactions.  The
   percentage of Web sites that collect personal data through such
   transactions is believed to be low.  This should be verified or
   refuted empirically, and if the percentage is sufficiently high
   should be modified to support these transactions.
   A P3P trust engine should be added to PJPS.
   PJPS could be made more user-configurable by allowing users to
   configure sites that should not produce an alert.  For example,
   an alert is produced, there could be a checkbox that makes PJPS
   producing alerts for that Web site.  Users should also be able to
   totally disable Privacy Evaluator functionality if they desire.
   PJPS could be ported to another language; possible candidates for a
   good first language to port to include French and
   Spanish. Discussion
   of internationalization issues is available in the thread starting
   Privacy Evaluator could be extended to access third-party
   machine-readable information about privacy policies.  One method
   be to use PICS to mark Web sites that a third party judges to have
   inadequate privacy protection.  A better method would be for P3P to
   extended to allow third-party label bureaus to serve P3P
   For privacy reasons, these bureaus should be as close to the user
   possible;  if the bureau is small and just lists a few popular
   it could be bundled in with Privacy Evaluator and sit on the user's
   To discourage malicious Web site administrators from tuning their
   pages to not alert Privacy Evaluator's fixed heuristics, the
   heuristics could be made variable rather than fixed and could be
   downloaded daily from a central database of heuristics that could
   change to counter common workarounds by malicious site
   It is unclear who would win this arms race between malicious Web
   administrators and Privacy Evaluator.

   Privacy Evaluator is a design for building a user agent that can
   detect the transmission of personally identifiable information
   HTML forms with what appears to be a large degree of accuracy.
   is a proof of concept that shows a Privacy Evaluator is
   feasible. When
   a user is in the process of transmitting personal identifiable
   information, an implementation of Privacy Evaluator can warn the
   if the Web site does not have an adequate machine-readable privacy
Versioning and Authorship:

   1.4 Nov 1 1998 Rolf Nelson additional input from Martin Duerst
   1.3 Oct 25 1998 Rolf Nelson additional input from Haym Hirsh,
   Marja-Riitta Koivunen, Eric Prud'hommeaux, Joseph Reagle, Daniel
   1.2 Oct 12 1998 Rolf Nelson additional input from Lorrie Cranor
   1.1 Sep 20 1998 Rolf Nelson additional input from Jason Catlett and
   Massimo Marchiori
   1.0 Aug 19 1998 Rolf Nelson original version, with input from Eric
   Prud'hommeaux, Joseph Reagle, Janne Saarela, Ralph Swick, Daniel
   Veillard.  Additional thanks to Dan Connolly, Jim Gettys and
   Marja-Ritta Koivunen.  Mistakes are mine, brilliant observations
   PJPS, the Privacy Evaluator implementation, was coded amazingly
   quickly by Janne Saarela.

   [FTC] "Privacy Online:  A Report to Congress,"
   [HTML] "HTML 4.0 Specification,"
   [Jigsaw] "Jigsaw Overview," [19]http://www.w3.org/Jigsaw/
   [P3P] "Platform for Privacy Preferences P3P Project,"
   [RFC822] "Standard for the Format of ARPA Internet Text Messages,"
   [W3C] "About the World Wide Web Consortium,"
   To Do:  , validate as HTML compliant, table of contents
   [23]Copyright  )  1998 [24]W3C ([25]MIT, [26]INRIA, [27]Keio ), All
   Rights Reserved. W3C [28]liability, [29]trademark, [30]document use
   and [31]software licensing rules apply.
   [32]Rolf Nelson <[33]rnelson@w3.org>


   1. http://www.w3.org/
   2. mailto:rnelson@w3.org
   3. mailto:www-privacy-evaluator@w3.org
   4. http://lists.w3.org/Archives/Public/www-privacy-evaluator/
   5. http://www.w3.org/Privacy/19981101-evaluator.html#P3P
   6. http://www.w3.org/Privacy/19981101-evaluator.html#FTC
   7. http://www.w3.org/Privacy/19981101-evaluator.html#Critic
   8. http://www.w3.org/Privacy/19981101-evaluator.html#W3C
   9. http://www.w3.org/Privacy/19981101-evaluator.html#P3P
  10. http://www.w3.org/Privacy/19981101-evaluator.html#Jigsaw
  11. http://www.w3.org/Privacy/19981101-evaluator.html#HTML
  12. http://www.w3.org/Privacy/19981101-evaluator.html#RFC822
  13. http://www19.w3.org/Archives/Public/www-privacy-evaluator/1998Oct/0000.html
  14. http://lists.w3.org/Archives/Public/www-privacy-evaluator/
  15. http://lists.w3.org/Archives/Public/www-privacy-evaluator/1998Oct/0001.html
  16. http://www.ics.uci.edu/~ackerman/pub/98i11/privacy-critics.pdf
  17. http://www.ftc.gov/reports/privacy3/toc.htm
  18. http://www.w3.org/TR/REC-html40/
  19. http://www.w3.org/Jigsaw/
  20. http://www.w3.org/P3P/
  21. http://info.internet.isi.edu/in-notes/rfc/files/rfc822.txt
  22. http://www.w3.org/Consortium/
  23. http://www.w3.org/Consortium/Legal/ipr-notice.html#Copyright
  24. http://www.w3.org/
  25. http://www.lcs.mit.edu/
  26. http://www.inria.fr/
  27. http://www.keio.ac.jp/
  28. http://www.w3.org/Consortium/Legal/ipr-notice.html#Legal
  29. http://www.w3.org/Consortium/Legal/ipr-notice.html#W3C
  30. http://www.w3.org/Consortium/Legal/copyright-documents.html
  31. http://www.w3.org/Consortium/Legal/copyright-software.html
  32. http://www.w3.org/People/#nelson
  33. mailto:rnelson@w3.org

| Rolf Nelson (rolf@w3.org), Project Manager, W3C at MIT
|   "Try to learn something about everything
|             and everything about something."  --Huxley

Received on Sunday, 1 November 1998 14:48:32 UTC

This archive was generated by hypermail 2.3.1 : Tuesday, 6 January 2015 19:41:13 UTC