RE: Cookie linking v3

From: Dobbs, Brooks <bdobbs@doubleclick.net>
Date: Thu, 11 Dec 2003 10:27:40 -0500
Message-ID: <D464F551A951ED4E804B9713B519E6C902941B2B@NYC-EX101.doubleclick.net>
To: "'Giles Hogben'" <giles.hogben@jrc.it>, public-p3p-spec <public-p3p-spec@w3.org>
Cc: WILIKENS MARC ANDRE <marc.wilikens@jrc.it>


I agree with your first part but you lost me on the triangulation from 2
domains.  I don't think that this type of triangulation is really all that
likely a scenario, remembering that Domains A and B are likely separate
entities with separate web servers with no practical means of linking that
cookie they each receive come from the same user (unless they have somehow
coordinated to set the same unique_id).  I see where you are going with
referrer and while that may give you a guess it would be a far fetched one.

I also agree with your analysis of domain level cookies.  I think this has
always been a real thorn in the side of P3P.  I will make a broad
generalization here, and feel free to shoot it down, - most large domains
and nearly all corporate domains outsource at least one or two hosts on
their domain and additionally maintain exceptionally little control of their
internal hosts (bobjones.company.com).  Here is another broad generalization
- many, many corporate domains set a domain level unique id cookie which
they claim to be pseudonymous.  I would venture a guess that that the
majority of these folks also have a corporate intranet on the same domain
that requires authentication usually via REMOTE_USER.  So now what is being
logged on the intranet: REMOTE_USER and DOMAIN COOKIE!!!  Opps looks like
Domain cookie is no longer anonymous.  I think that what all this points to
is, sadly, true accuracy and compliancy are beyond the efforts that most
implementers are taking at this time.


Cookies can contain a maximum of 4kb of data which must be transmitted
across the network twice before it can be used. For this reason, cookies
tend to store only a number (or unique key) which links to a value in a
database. Data about a user is stored in a database and that record is given
a number (a unique key). Only this number is stored in the cookie on the
user's computer. When the user revisits the site which set the cookie, that
site can immediately have access to a potentially unlimited amount of
information about the user simply by looking up the number in the database
in which the number is a key. Linkability may be direct or indirect. For
example a key stored in one cookie may not link directly to a user's name
but instead the user's name may be deduced by examining two cookies replayed
by separate domains but linked by a unique referrer. 

Example 1.

Image A on Domain A replays a cookie linking to a record of the user's
street name and number.
Image B on Domain B replays a cookie which links to a record of the user's
home town.
Images A and B are displayed on pages with session ID's within Domain C. 
By using the referrer URI (which contains a unique session key), these two
cookies be linked together to give a unique address and through another
database, the user's name.

With enough effort, similar but more sophisticated data mining techniques
may be applied to link even seemingly highly anonymised data with cookie
values. P3P applies the principle of proportionality to such linkability.
The specification of the data and purposes covered by cookie should be
thought of in terms of the analysis which might reasonably be carried out on
such a cookie to achieve the stated purpose. For example if a cookie is set
to track criminals' personal data then it is reasonable that a considerable
effort might be put into database analysis. The cookie should therefore be
said applying to personally identifiable data even if the data is actually
hashed in the database. If on the other hand, the cookie is set in order to
track a session and data is stored in the database but anonymised by
hashing, then there is no need to state that the data is identifiable. This
type of anonymization is in theory not secure because hashes have a 1-1
correspondence with ip addresses for example so by hashing all possible ip
addresses, you can trace the original ip address. However extending the
definition of linkability to this extent is neither practical nor

Third party cookies are cookies which are set by a domain other than the
page being viewed. This is done through embedded images as in Example 1 and
can even occur in emails and applications which use web services, such as
music players. While normally the information stored in one domain's cookie
cannot be accessed by another domain, third party cookies bypass this
mechanism by placing the same third party image in different domain's pages.
This allows tracking of users across different domains. The intention to
carry out such tracking activities through linking cookie keys across
different domains should also be declared.

A connected problem is that a site SETTING a cookie may not be in control of
all the purposes to which a cookie is put ON REPLAY. For example some
domains set a domain level cookie (at the level *.xyz.com) which is replayed
to thousands of subhosts (a.xyz.com,b.xyz.com ...), whose data collection
practices are not under their control. Currently the solution of evaluating
cookies on replay has been discounted because of performance issues and data
protection issues linked to the act of storage of data on the user's
machine. Therefore P3P requires that entities publishing policies in
accordance with correct practice declare any potential purposes to which a
cookie might be put. It follows therefore that if a cookie is used for a new
and unforseen purpose, it SHOULD be reset along with a fresh P3P policy.

Giles Hogben
European Commission Joint Research Centre
Institute for the Protection and Security of the Citizen Cybersecurity New
technologies for Combatting Fraud Unit TP 267 Via Enrico Fermi 
