Re: how does host B know that its visitor is the one that visited host A? from Jonathan Rees on 2011-08-15 (www-tag@w3.org from August 2011)

From: Jonathan Rees <jar@creativecommons.org>
Date: Mon, 15 Aug 2011 15:47:09 -0400
To: Mischa Tuffield <mischa@mmt.me.uk>
Cc: www-tag@w3.org
Message-ID: <CACHXnapPh3PZ9p3nYfQ-H4Y6xE8U1-VRM0HyXGBED95i2oUGpg@mail.gmail.com>
On Mon, Aug 15, 2011 at 12:41 PM, Mischa Tuffield <mischa@mmt.me.uk> wrote:
> Hi Jonathan/All,
> Please excuse the top-post.
>
> As discussed in this thread it could be one of many things which made you
> see what you saw when you were browsing the web. I thought I would give some
> pointers to work highlighting some of the issues discussed in this thread:

> 1. Cookies and how "deleting cookies" is not enough anymore. Cookie
> re-spawning, via the use of the Flash Storage, HTML5's local storage, or
> more recently Etags and JS - is an ongoing struggle. This is explained by
> Ashkan Soltani on this website [1]. The evercookie[2] got some press when it
> described how to create persistent cookies about 6 months ago. Wired
> magazine and Bruce Schneier have also talked about a new e-tag based
> approach for persisting cookies[3]. Quoting:
> ... "in addition to Flash and HTML5 LocalStorage, <SOMECOMPANY> was
> exploiting the browser cache to store persistent identifiers via stored
> Javascript and ETags" ...
> The above suggests that even if you choose to have no cookies, no local
> storage, and no flash based storage, you are still subject to tracking.

Yes, we've already discussed this stuff on www-tag. Some of these
channels are going to be visible across tabs (origins) and some
aren't; the ones that are might have led to the leak I saw. But as
Alan described there's no need for anything more than cookies to
explain the particular case in question.

> 2. Balachander Krishnamurthy [4], has done some excellent work highlighting
> how much information is circulated between the various ad-networks. And how
> many of the ad-networks are owned by a handful of companies. The slides for
> a plenary talk he gave at an IETF workshop summarises some truly awesome
> work, and it definitely worth a browse [5].

> 3. And finally, with regarding to UA and browser fingerprinting, the EFF
> have an excellent tool [6] which highlights how one's browser UA is "most
> likely uniquely identifiable". T. Lowenthal had a position paper at the
> W3C's Web Tracking and User privacy workshop [7], where he put it to the
> browser vendors to start protecting their users' privacy, one of his
> suggestions was "Fingerprint Uniqueness Reduction", which seems very
> sensible to say the least. Saying that given that your experience was
> cross-browser, this probably wasn't what was happening to you.

Thanks much - these are all useful resources and I'll keep them in
mind in future discussions of this kind of thing.

(I think I didn't express myself well; the behavior I saw was not
cross-browser. The cross-browser test I did was just to rule out IP
address as the identifying information.)

Jonathan

> Regards,
> Mischa *goes back to lurking ...
> [1] http://ashkansoltani.org/docs/respawn_redux.html
> [2] http://samy.pl/evercookie/
> [3] https://www.schneier.com/blog/archives/2011/08/new_undeletable.html
> [4] http://www2.research.att.com/~bala/papers/
> [5] http://www.ietf.org/proceedings/77/slides/plenaryt-5.pdf
> [6] https://panopticlick.eff.org/
> [7] http://www.w3.org/2011/track-privacy/papers/lowenthal_position-paper.pdf
> _________________________________
> Mischa Tuffield PhD
> Email: mischa@mmt.me.uk
> Homepage: http://mmt.me.uk/
> WebID: http://mmt.me.uk/foaf.rdf#mischa
>
> On 14 Aug 2011, at 15:45, Mukul Gandhi wrote:
>
> Hi Jonathan,
>
> On Fri, Aug 12, 2011 at 8:41 PM, Jonathan Rees <jar@creativecommons.org>
> wrote:
>
> How does this work? I.e. what are browser instances doing that leaks
>
> their identity to servers? Is it just a lucky guess based on
>
> User-agent or something?
>
> I believe, that the "User-Agent" HTTP request header field is a
> reliable way for a server to know, that with which user agent (usually
> a web browser) it is sending response to.
>
> Here's an excerpt from the document
> http://www.w3.org/Protocols/rfc2616/rfc2616-sec14.html, which explains
> this,
>
> <quote>
>
> 14.43 User-Agent
>
> The User-Agent request-header field contains information about the
> user agent originating the request. This is for statistical purposes,
> the tracing of protocol violations, and automated recognition of user
> agents for the sake of tailoring responses to avoid particular user
> agent limitations. User agents SHOULD include this field with
> requests. The field can contain multiple product tokens (section 3.8)
> and comments identifying the agent and any subproducts which form a
> significant part of the user agent. By convention, the product tokens
> are listed in order of their significance for identifying the
> application.
>
>       User-Agent     = "User-Agent" ":" 1*( product | comment )
>
> Example:
>
>       User-Agent: CERN-LineMode/2.15 libwww/2.17b3
>
> </quote>
>
> I think nearly every web browser sends this field (and it's value) to
> the web server, it is sending a request to.
>
>
>
>
> --
> Regards,
> Mukul Gandhi
>
>
>
>
>
>
Received on Monday, 15 August 2011 19:47:38 UTC