Re: confirm and fingerprinting issues from Matthias Schunter (Intel Corporation) on 2017-08-22 (public-tracking@w3.org from August 2017)

From: Matthias Schunter (Intel Corporation) <mts-std@schunter.org>
Date: Tue, 22 Aug 2017 10:39:23 +0200
To: public-tracking@w3.org
Message-ID: <a288501d-6bf5-1017-f98b-8f415ef4a874@schunter.org>
Hi Mike,

thanks for the clarification.

I now (hopefully) understand: Instead of pushing an identifier as a
whole (9437489), you push individual bits (bit1-0, bit2-1, bit3-1, ...).
Then querying them gets efficient; only say 32 queries (one per bit)
needed ;-(

Thos the "you can only query what you store" approach does not mitigate
this fingerprinting risk (it is efficient to query 32 bits).

Your suggested mitigation is to disallow subresources from requesting
user-granted _site-specific_ exceptions (only the main site is allowed
to do so). They would still be allowed to request web-wide exceptions
(where this risk does not seem to exist).

This seems to be a workable and efficient solution.

Any thoughts?


Regards,
matthias

PS: Am I right that the main site could still use site-specific UGE
approach for fingerprinting? Anything we can mitigate for them?



On 22.08.2017 10:22, Mike O'Neill wrote:
> Hi Matthias,
> 
> That is not quite what I meant. The fingerprinting I identified would allow
> the subresource to assign a random number (up to 32 bits long in my
> example), because there are 32 sub-subresources (lets call them
> grandchildren of the first-party site):
> 
> b0.images.schunter.org
> b1.images.schunter.org
> b2.images.schunter.org
>                   .
>                   .
>                   .
> B31.images.schunter.org
> 
> Each grandchild represents one bit in the 32 bit string.
> 
> If an exception exists for a particular grandchild, that represents a 0 at
> that particular bit position
> Otherwise the value of the bit is 1.
> 
> The value of each grandchild "bit" is communicated back to
> images.schunter.org by each grandchild detecting its DNT header (say by
> reading navigator.doNotTrack), then sending the 1 bit value in a message
> using the postMessage API.
> 
> Then images.schunter.org receives all these messages and assembles the
> original 32 bit string from them.
> 
> Note, this does not need the confirm call, though it could. Restricting the
> confirm call does not fix the risk because the same information can be
> obtained via postMessage.
> 
> This is complicated, but it is just javascript. Once it is done it will be
> easy to reproduce. It gives subresources the ability to generate UIDs even
> when they are blocked from using cookies e.g. on Safari. There are already
> other more complicated methods for doing this in the wild, one of the
> reasons for Apple's ITB in OS11.
> 
> 
> 
> Mike
> 
> 
> 
> 
> 
> 
> -----Original Message-----
> From: Matthias Schunter (Intel Corporation) [mailto:mts-std@schunter.org] 
> Sent: 22 August 2017 07:44
> To: Michael O'Neill <michael.oneill@btinternet.com>; public-tracking@w3.org
> Cc: 'Roy T. Fielding' <fielding@gbiv.com>
> Subject: Re: confirm and fingerprinting issues
> 
> Hi Mike,
> 
> 
> thanks a lot for the analysis of fingerprinting.
> 
> If I understand correctly, a sub-resource (say images.schunter.org) can
> obtain an exception for its "tracker7289437923.images.schunter.org"
> where tracker7289437923 is unique to a user for this subdomain. Since
> tracker7289437923 is unique, your concern is that by learning that there
> is a UGE for tracker7289437923, the site knows what user is visiting.
> 
> I believe that this is not a severe fingerprinting risk for the
> following reason:
> 
> Assume that the web-site has registered a table of UGEs
>   TRACKERID		NAME
>   tracker7289437923 	Joe
>   tracker728laksdjh	Jim
>   trackerk823982089	Helen
>   ....
> 
> In theory, obtaining a line from this table allows fingerprinting.
> However, our "confirm" API only allows to verify whether a single line
> exists. I.e. I could indeed confirm whether I am talking to a given user:
> - if confirm("tracker7289437923.images.schunter.org") is true, then I am
> talking to Joe.
> 
> However, using the scheme to fingerprint larger numbers of users seems
> not really feasible: One needs to call the confirm() API once for each
> subdomain that corresponds to each potential user:
>   tracker7289437923 	
>   tracker728laksdjh	
>   trackerk823982089	
>   ....
> 
> Ensuring this was the rationale (AFAIR) that David Signer insisted that
> confirm must be called with the exact parameters of the store() call.
> 
> What do you think? If we agree that there is still a larger risk, we
> should investigate your potential resolution (which I have not checked
> in detail yet; since I am not 100% sure I see the risk).
> 
> Any feedback is welcome!
> 
> matthias
> 
> 
> 
> 
> On 21.08.2017 21:19, Michael O'Neill wrote:
>> I think the web-wide issue is fine with Roy's sentence:
>>
>> For each of the targets in a web-wide exception, a user agent must not
> store
>> the duplets and must reject the promise with a DOMException named
>> "SecurityError" unless the target domain matches both the
> document.domain of
>> the script's responsible document and the document.domain of the
> top-level
>> browsing context's active document [HTML5]. This effectively limits the
> API
>> for web-wide exceptions to the single target domain of the caller.
>>
>> This limits web-wide consent to the top-level browsing context which was
> how
>> it always was supposed to be.
>>
>> But as the text is now, a subresource browsing context (aka an iframe)
> can
>> still specify a site-specific exception for itself and its own set of
>> targets. This could be a danger because it allows a third-party
> subresource
>> to invisibly create arbitrary exceptions for itself, which it can then
> use
>> to fingerprint the user agent. It would do this by creating  a set of
>> subresource iframes and establishing a UGEs for a random set of them.
>>
>> For example, subresorce.com loads 32 child  iframes b0.subresource.com,
>> b1.subresource.com, ..., b31.subresource.com. 
>>
>> When it exists as a subresource on top-level site example.com for user
> Alice
>> it creates a UGE for targets bX.subresource.com, bY.subresource.com,
> ...,
>> bZ.subresource.com . i.e. a random 32 bit pattern unique to Alice.
>>
>> When Alice later revisits example.com DNT:0 will be sent in requests for
> the
>> subset of targets specified in the UGE. These subresources can then
>> communicate back to the parent subresource the value of DNT they have
>> received, using the postMessage API. Thus subresource.com can recognise
>> Alice without having to place a third-party cookie. It cannot do this
> for
>> sites other than example.com, but it is still a privacy risk.
>>
>> We do not have a use case for a subresource initiated site-specific UGE,
> so
>> why do we need it? the easiest way to fix this is simply to adopt Roy's
>> wording for all UGEs, not just web-wide ones.
>>
>> For the other issue, making the confirm call (now called
>> Navigator.trackingExceptionExists) capable of confirming exceptions for
>> cookie rule subdomains as Navigator.storeTrackingException does, I
> suggest
>> the following derived from Roy's definition of "site" for
>> storeTrackingException, with a lone "*" illegal:
>>
>> site
>> The referring domain scope where an exception should be confirmed:
>> If site is undefined, null, or the empty string, the referring domain
> scope
>> defaults to the [site domain].
>> Otherwise, the referring domain scope is defined by a domain found in
> site
>> that is treated in the same way as the domain parameter to cookies
>> [RFC6265], allowing subdomains to be included with the prefix "*.". The
>> value can be set to a fully-qualified right-hand segment of the document
>> host name, up to one level below TLD. If such a domain scope cannot be
>> parsed then the user agent must reject the promise with the DOMException
>> named "SecurityError"
>>
>> Comments?
>>
>> Mike
>>
>>
>>
>>
> 
> 
>
Received on Tuesday, 22 August 2017 08:39:57 UTC