Inconsistency in the definition of dynamic.clickstream.clientip and its sub-elements

    There are inconsistencies in the categories given for the client IP
address and its sub-elements in the base dataschema as defined in the CR
specification. The CR specification contains the following categories:

dynamic.clickstream.clientip: <computer/>
dynamic.clickstream.clientip.hostname: <uniqueid/>
dynamic.clickstream.clientip.partialhostname: <demographic/>
dynamic.clickstream.clientip.fullip: <uniqueid/>
dynamic.clickstream.clientip.partialip: <demographic/>

     It is the opinion of the specification WG that hostnames and full IP
addresses belong in the <computer/> category. The <computer/> category is
defined as follows:
Computer Information: Information about the computer system that the
individual is using to access the network -- such as the IP number, domain
name, browser type or operating system.
     This definition clearly covers IP addresses and hostnames. Thus the
dynamic.clickstream.clientip.hostname and
dynamic.clickstream.clientip.fullip elements must be in the <computer/>
category.

     In addition, the WG believes that <uniqueid/> is not an appropriate
description of a full IP address or hostname. The <uniqueid/> category is
defined as follows:
Unique Identifiers: Non-financial identifiers, excluding government-issued
identifiers, issued for purposes of consistently identifying the
individual. These include identifiers issued by a Web site or service.
     IP addresses and hostnames are not issued for the purposes of
consistenly identifying an individual. IP addresses are issued for the
purpose of routing packets in a network, and hostnames are issued for the
purpose of giving easy-to-remember mnemonics for IP addresses.
     The argument could be made that IP addresses can be used to uniquely
identify an individual. This is true in some cases: computer systems which
have fixed IP addresses, and which connect directly to their destination,
can be identified consistently by their IP addresses. However, this is a
weak mapping to an individual. Some computer systems are used by multiple
individuals, and IP addresses identify a computer system only, not an
individual. In addition, the presense of proxies and firewalls in the
network means that a great many computer systems have their own IP address
masked from the destination with which they are speaking. Furthermore, many
computer systems (such as systems accessing the Internet through dialup
access) have dynamically-assigned addresses which cannot easily be linked
with an individual computer system. This makes it something of a stretch to
describe IP addresses as unique IDs.
     Current Web server practices also discourage the use of <uniqueid/> to
describe IP addresses. The overwhelming majority of Web servers currently
in use log the requests they receive. These logs almost always contain the
IP address of the computer system making the request, the URL requested,
the time of the request, and other information. Placing IP addresses into
the <uniqueid/> category would mean that nearly every Web site would need
to declare that they collect this category of information.
     Doing this significantly reduces the usefulness of the <uniqueid/>
category. If a user-agent chooses to look at the categories of information
collected by a site, rather than the individual data elements collected,
then that user-agent would be unable to discriminate between sites which
collect standard Web server access logs, and those which assign unique
persistent IDs (perhaps through cookies) to all visitors. It is our belief
that these two practices are perceived differently by the general Web-using
public, and therefore the P3P specification should reflect this
distinction.
     Since hostnames are directly linked to IP addresses by the DNS system,
and the two can be freely converted from one to another, all of the above
about IP addresses applies equally well to hostnames.

     The last inconsistency regards the categories assigned to
dynamic.clickstream.clientip. In P3P, categories must always "bubble
upwards" in dataschemas. Since a policy which declares collecting
structured element a.b.c implicitly includes all subelements (a.b.c.x,
a.b.c.y, a.b.c.z), all categories assigned to any of the sub-elements must
be assigned to their parent element. Therefore, since
dynamic.clickstream.clientip.fullip and
dynamic.clickstream.clientip.fullhostname are in the <computer/> category,
and dynamic.clickstream.clientip.partialip and
dynamic.clickstream.clientip.partialhostname are in the <demographic/>
category, then their parent element - dynamic.clickstream.clientip - must
be in both <computer/> and <demographic/> categories.

     The end result is that the following categories would be applied to
these data elements:
dynamic.clickstream.clientip: <computer/>, <demographic/>
dynamic.clickstream.clientip.hostname: <computer/>
dynamic.clickstream.clientip.partialhostname: <demographic/>
dynamic.clickstream.clientip.fullip: <computer/>
dynamic.clickstream.clientip.partialip: <demographic/>

     -- Martin

Martin Presler-Marshall - Program Manager, Privacy Technology
E-mail: mpresler@us.ibm.com
Phone: (919) 254-7819 (tie-line 444-7819) Fax: (919) 254-6430 (tie-line
444-6430)

Received on Tuesday, 3 July 2001 14:50:28 UTC