Re: Deciding Exceptions (ISSUE-23, ISSUE-24, ISSUE-25, ISSUE-31, ISSUE-34, ISSUE-49)

Breaking things down, there are two questions:

- Can you be tracked globally by IP and User Agent?  This is true in the
strongest sense if you have a very high-entropy User Agent (in the current
live Panopticlick dataset, which is approximately representative of
privacy-conscious users, 4.5% of the 2 million user agent strings were
unique -- these browsers not only have their reading habits but also their
location tracked by IP + UA).  Global trackability is also very high if we
are talking about a computer that does not get moved from one network
connection to another.  I don't have numbers for that, but estimate 5-25%
and we're talking about 10-30% of browsers being strongly globally
trackable by IP + UA.

- There is a weaker kind of globally trackable, where a browser has several
IP addresses that is uses regularly, and the UA makes it unique at each of
those.  You're strongly trackable at each connection that you use regularly
(home, work, cafes, friends' houses).  Calculating this is  more
complicated, since it depends on the distribution of anonymity sets at each
IP you use.  It will be very common for people to be trackable when they're
at home, and rather less common for them to be trackable at a cafe that
offers free wifi.

Even though these privacy scenarios are not great, being trackable by ID
cookie is clearly a *lot* worse.  Remember that ID cookies are cumulative
with all of the above: IP + UA + ID cookie means that even if you delete
your cookies, your IP + UA can be used to link your old ID with your new
one on the server side.  Also for browsers that would have been "weakly"
globally trackable by IP+UA, it means that while the ID is being used to
resolve the browser's identity, the IP address can be used to follow its
location history.

On 10 February 2012 08:30, Jonathan Mayer <jmayer@stanford.edu> wrote:

> Justin,
>
> I think you may be misreading the state of research on tracking through IP
> address + User-Agent string.  There is substantial evidence that some
> browsers can be tracked in that way some of the time.  I am not aware of
> any study that compares the global effectiveness of tracking through IP
> address + User-Agent string vs. an ID cookie; intuitively, the ID cookie
> should be far more effective.  The news story you cite glosses over
> important caveats in that paper's methodology; it is certainly not the case
> that "62% of the time, HTTP user-agent information alone can accurately tag
> a host."
>
> Jonathan
>
> On Feb 9, 2012, at 6:48 PM, Justin Brookman wrote:
>
> Sure.  As the spec current reads, third-party ad networks are allowed to
> serve contextual ads on sites even when DNT:1 is on, yes?  In order to do
> this, they're going to get log data, user agent string, device info, IP
> address, referrer url, etc.  There is growing recognition that that
> information in and of itself can be used to uniquely identify devices over
> time (
> http://www.networkworld.com/news/2012/020212-microsoft-anonymous-255667.html)
> for profiling purposes.  It was my understanding that one of the primary
> arguments against allowing third parties to place unique identifiers on the
> client was because of the concern that they were going to be secretly
> tracking and building profiles using those cookies.  My point is that they
> will be able to do that regardless, with little external ability to audit.
> This system is going to rely to some extent on trust unless we are
> proposing to fundamentally rearchitecture the web.
>
> The other argument that I've heard against using unique cookies for this
> purpose is valid, though to me less compelling: that even if just used for
> frequency capping, third parties are going to be able to amass data about
> the types of ads a device sees, from which you could surmise general
> information about the sites visited on that device (e.g., you are frequency
> capping a bunch of sports ads --> ergo, the operator of that device
> probably visiting sports pages).  Everyone seems to agree that it would be
> improper for a company to use this information to profile (meta-profile?),
> but there are still concerns about data breach, illegitimate access, and
> government access of this potentially revealing information.  This concerns
> me too, but the shadow of my .url stream is to me considerably less privacy
> sensitive than my actual .url stream.  I could be willing to compromise on
> a solution that allowed for using cookies for frequency capping, if there
> was agreement on limiting to reasonable campaign length, rules against
> repurposing, and a requirement to make an accountable statement of
> adherence to the standard.  I would be interested to hear if it would be
> feasible to not register frequency caps for ads for sensitive categories of
> information (or if at all, cap client-side), though again, it's important
> to keep in mind that that data may well be collected and retained for other
> excepted purposes under the standard (e.g., fraud prevention) --- cookie or
> not.
>
> ------------------------------
> *From:* Jonathan Mayer [mailto:jmayer@stanford.edu]
> *To:* Justin Brookman [mailto:justin@cdt.org]
> *Cc:* public-tracking@w3.org
> *Sent:* Thu, 09 Feb 2012 18:32:19 -0500
> *Subject:* Re: Deciding Exceptions (ISSUE-23, ISSUE-24, ISSUE-25,
> ISSUE-31, ISSUE-34, ISSUE-49)
>
> Justin, could you explain what you mean here?
>
> Thanks,
> Jonathan
>
> On Feb 9, 2012, at 3:17 PM, Justin Brookman wrote:
>
> > the standard currently recognizes that third parties are frequently
> going to be allowed to obtain uniquely-identifying user agent strings
> despite the presence of a DNT:1 header
>
>
>


-- 
Peter

Received on Friday, 10 February 2012 17:33:44 UTC