RE: confirm and fingerprinting issues from Mike O'Neill on 2017-08-24 (public-tracking@w3.org from August 2017)

From: Mike O'Neill <michael.oneill@baycloud.com>
Date: Thu, 24 Aug 2017 19:05:19 +0100
To: "'Matthias Schunter $Intel Corporation$'" <mts-std@schunter.org>, <public-tracking@w3.org>, "'Roy T. Fielding'" <fielding@gbiv.com>, "'Shane M Wiley'" <wileys@oath.com>
Message-ID: <10fe01d31d03$9353b3c0$b9fb1b40$@baycloud.com>
While restricting the API to top-level context stops it being used by bad
actors (to invisibly fingerprint), it also stops the use-case Shane has
identified of being able to assign consent to multiple domains. No longer
will it be possible to call the API from an iframe, so top level script will
not be able to dynamically create browsing contexts that do that.

I think the only way to fix the security weakness is to stop sub-resources
using the API, but it is very desirable to still allow the registering of
exceptions for other-origin (though same-party) domains. This will be useful
not just to larger sites.

I think both can be done as long as a check is made that the script-origin
controls the other domains. The security and privacy benefit of disallowing
subresources using the API far outweighs any threat from first-parties
getting it wrong.  
 
I spent today amending the API to show how this could be specified using the
same-party array:

https://w3c.github.io/dnt/drafts/samepartyawareapi.html#exceptions

See Section 6. It is in a new file to be web readable. It would be easy to
create a PR for it against the master branch.

Another possible way to check that script origins control other origins is
to use CORS (or fetch) , but this adds round-trips and therefore would be
slow. The same-party way will be a lot more efficient.  We could add
CORS/fetch as belt and braces if people thought it necessary.

Please take the time to consider this before Monday's call. 


Mike


-----Original Message-----
From: Matthias Schunter (Intel Corporation) [mailto:mts-std@schunter.org] 
Sent: 22 August 2017 11:59
To: public-tracking@w3.org
Subject: Re: confirm and fingerprinting issues

Hi Mike,


thanks for the clarification.

I believe your resolution should substantially reduce the fingerprinting
isk.

Any other concerns/objections?


Regards,
matthias



On 22.08.2017 11:31, Mike O'Neill wrote:
> Matthias, subresources are already denied making web-wide extensions (by
> Roy's last change). My suggestion is to generalise his sentence to cover
> site-specific also. 
> 
> Mike
> 
> -----Original Message-----
> From: Matthias Schunter (Intel Corporation) [mailto:mts-std@schunter.org] 
> Sent: 22 August 2017 09:39
> To: public-tracking@w3.org
> Subject: Re: confirm and fingerprinting issues
> 
> Hi Mike,
> 
> thanks for the clarification.
> 
> I now (hopefully) understand: Instead of pushing an identifier as a
> whole (9437489), you push individual bits (bit1-0, bit2-1, bit3-1, ...).
> Then querying them gets efficient; only say 32 queries (one per bit)
> needed ;-(
> 
> Thos the "you can only query what you store" approach does not mitigate
> this fingerprinting risk (it is efficient to query 32 bits).
> 
> Your suggested mitigation is to disallow subresources from requesting
> user-granted _site-specific_ exceptions (only the main site is allowed
> to do so). They would still be allowed to request web-wide exceptions
> (where this risk does not seem to exist).
> 
> This seems to be a workable and efficient solution.
> 
> Any thoughts?
> 
> 
> Regards,
> matthias
> 
> PS: Am I right that the main site could still use site-specific UGE
> approach for fingerprinting? Anything we can mitigate for them?
> 
> 
> 
> On 22.08.2017 10:22, Mike O'Neill wrote:
>> Hi Matthias,
>>
>> That is not quite what I meant. The fingerprinting I identified would
> allow
>> the subresource to assign a random number (up to 32 bits long in my
>> example), because there are 32 sub-subresources (lets call them
>> grandchildren of the first-party site):
>>
>> b0.images.schunter.org
>> b1.images.schunter.org
>> b2.images.schunter.org
>>                   .
>>                   .
>>                   .
>> B31.images.schunter.org
>>
>> Each grandchild represents one bit in the 32 bit string.
>>
>> If an exception exists for a particular grandchild, that represents a 0
at
>> that particular bit position
>> Otherwise the value of the bit is 1.
>>
>> The value of each grandchild "bit" is communicated back to
>> images.schunter.org by each grandchild detecting its DNT header (say by
>> reading navigator.doNotTrack), then sending the 1 bit value in a message
>> using the postMessage API.
>>
>> Then images.schunter.org receives all these messages and assembles the
>> original 32 bit string from them.
>>
>> Note, this does not need the confirm call, though it could. Restricting
> the
>> confirm call does not fix the risk because the same information can be
>> obtained via postMessage.
>>
>> This is complicated, but it is just javascript. Once it is done it will
be
>> easy to reproduce. It gives subresources the ability to generate UIDs
even
>> when they are blocked from using cookies e.g. on Safari. There are
already
>> other more complicated methods for doing this in the wild, one of the
>> reasons for Apple's ITB in OS11.
>>
>>
>>
>> Mike
>>
>>
>>
>>
>>
>>
>> -----Original Message-----
>> From: Matthias Schunter (Intel Corporation) [mailto:mts-std@schunter.org]

>> Sent: 22 August 2017 07:44
>> To: Michael O'Neill <michael.oneill@btinternet.com>;
> public-tracking@w3.org
>> Cc: 'Roy T. Fielding' <fielding@gbiv.com>
>> Subject: Re: confirm and fingerprinting issues
>>
>> Hi Mike,
>>
>>
>> thanks a lot for the analysis of fingerprinting.
>>
>> If I understand correctly, a sub-resource (say images.schunter.org) can
>> obtain an exception for its "tracker7289437923.images.schunter.org"
>> where tracker7289437923 is unique to a user for this subdomain. Since
>> tracker7289437923 is unique, your concern is that by learning that there
>> is a UGE for tracker7289437923, the site knows what user is visiting.
>>
>> I believe that this is not a severe fingerprinting risk for the
>> following reason:
>>
>> Assume that the web-site has registered a table of UGEs
>>   TRACKERID		NAME
>>   tracker7289437923 	Joe
>>   tracker728laksdjh	Jim
>>   trackerk823982089	Helen
>>   ....
>>
>> In theory, obtaining a line from this table allows fingerprinting.
>> However, our "confirm" API only allows to verify whether a single line
>> exists. I.e. I could indeed confirm whether I am talking to a given user:
>> - if confirm("tracker7289437923.images.schunter.org") is true, then I am
>> talking to Joe.
>>
>> However, using the scheme to fingerprint larger numbers of users seems
>> not really feasible: One needs to call the confirm() API once for each
>> subdomain that corresponds to each potential user:
>>   tracker7289437923 	
>>   tracker728laksdjh	
>>   trackerk823982089	
>>   ....
>>
>> Ensuring this was the rationale (AFAIR) that David Signer insisted that
>> confirm must be called with the exact parameters of the store() call.
>>
>> What do you think? If we agree that there is still a larger risk, we
>> should investigate your potential resolution (which I have not checked
>> in detail yet; since I am not 100% sure I see the risk).
>>
>> Any feedback is welcome!
>>
>> matthias
>>
>>
>>
>>
>> On 21.08.2017 21:19, Michael O'Neill wrote:
>>> I think the web-wide issue is fine with Roy's sentence:
>>>
>>> For each of the targets in a web-wide exception, a user agent must not
>> store
>>> the duplets and must reject the promise with a DOMException named
>>> "SecurityError" unless the target domain matches both the
>> document.domain of
>>> the script's responsible document and the document.domain of the
>> top-level
>>> browsing context's active document [HTML5]. This effectively limits the
>> API
>>> for web-wide exceptions to the single target domain of the caller.
>>>
>>> This limits web-wide consent to the top-level browsing context which was
>> how
>>> it always was supposed to be.
>>>
>>> But as the text is now, a subresource browsing context (aka an iframe)
>> can
>>> still specify a site-specific exception for itself and its own set of
>>> targets. This could be a danger because it allows a third-party
>> subresource
>>> to invisibly create arbitrary exceptions for itself, which it can then
>> use
>>> to fingerprint the user agent. It would do this by creating  a set of
>>> subresource iframes and establishing a UGEs for a random set of them.
>>>
>>> For example, subresorce.com loads 32 child  iframes b0.subresource.com,
>>> b1.subresource.com, ..., b31.subresource.com. 
>>>
>>> When it exists as a subresource on top-level site example.com for user
>> Alice
>>> it creates a UGE for targets bX.subresource.com, bY.subresource.com,
>> ...,
>>> bZ.subresource.com . i.e. a random 32 bit pattern unique to Alice.
>>>
>>> When Alice later revisits example.com DNT:0 will be sent in requests for
>> the
>>> subset of targets specified in the UGE. These subresources can then
>>> communicate back to the parent subresource the value of DNT they have
>>> received, using the postMessage API. Thus subresource.com can recognise
>>> Alice without having to place a third-party cookie. It cannot do this
>> for
>>> sites other than example.com, but it is still a privacy risk.
>>>
>>> We do not have a use case for a subresource initiated site-specific UGE,
>> so
>>> why do we need it? the easiest way to fix this is simply to adopt Roy's
>>> wording for all UGEs, not just web-wide ones.
>>>
>>> For the other issue, making the confirm call (now called
>>> Navigator.trackingExceptionExists) capable of confirming exceptions for
>>> cookie rule subdomains as Navigator.storeTrackingException does, I
>> suggest
>>> the following derived from Roy's definition of "site" for
>>> storeTrackingException, with a lone "*" illegal:
>>>
>>> site
>>> The referring domain scope where an exception should be confirmed:
>>> If site is undefined, null, or the empty string, the referring domain
>> scope
>>> defaults to the [site domain].
>>> Otherwise, the referring domain scope is defined by a domain found in
>> site
>>> that is treated in the same way as the domain parameter to cookies
>>> [RFC6265], allowing subdomains to be included with the prefix "*.". The
>>> value can be set to a fully-qualified right-hand segment of the document
>>> host name, up to one level below TLD. If such a domain scope cannot be
>>> parsed then the user agent must reject the promise with the DOMException
>>> named "SecurityError"
>>>
>>> Comments?
>>>
>>> Mike
>>>
>>>
>>>
>>>
>>
>>
>>
> 
> 
>
Received on Thursday, 24 August 2017 18:06:30 UTC