RE: confirm and fingerprinting issues from Mike O'Neill on 2017-08-28 (public-tracking@w3.org from August 2017)

From: Mike O'Neill <michael.oneill@baycloud.com>
Date: Mon, 28 Aug 2017 16:52:30 +0100
To: "'David Singer'" <singer@mac.com>, "'Matthias Schunter $Intel Corporation$'" <mts-std@schunter.org>
Cc: <public-tracking@w3.org>
Message-ID: <02a501d32015$a86fd620$f94f8260$@baycloud.com>
I think it started being aesthetic but ended up as functional and has probably got a bit out of hand. Other than just having spent a day implementing the new changes, I for one would not mind going back to the old API. I know it worked (other than the wrinkle you just threw in - but we can fix that).







-----Original Message-----
From: David Singer [mailto:singer@mac.com] 
Sent: 28 August 2017 16:04
To: Matthias Schunter (Intel Corporation) <mts-std@schunter.org>
Cc: public-tracking@w3.org
Subject: Re: confirm and fingerprinting issues

I don’t know why we are changing design in these (subtle?) ways at this point. Can you remind me what we were trying to fix?

> On Aug 25, 2017, at 2:49 , Matthias Schunter (Intel Corporation) <mts-std@schunter.org> wrote:
> 
> Hi Folks,
> 
> my goal would be to satisfy the requirements of the industry in this
> group ;-)
> 
> UNDERSTANDING THE PROBLEM
> 
> I would like to enhance our understanding before converging on a design.
> 
> I understand the fingerprinting risks of UGE for sub-domains
> (b1.site.com, bit2.site.com, ...). Mitigating the risks would be desirable.
> 
> I do not understand why this same risk is not there for the main site.
> 
> I also do not understand why site-wide exceptions should not be allowed
> (I do not see their fingerprinting risk) and I agree that allowing
> google to, e.g., ask for a "youtube.com" UGE in an iframe is a desirable
> use case.
> 
> ENUMERATING POTENTIAL SOLUTIONS
> 
> [1. Keep the old design] - allow UGE also for sub-contexts. This is the
> default unless we reach consensus to change something.
> 
> This would allow web-wide and site-specific.
> 
> One way to mitigate the fingerprinting risk of specific sub-domains in
> this scenario would be to limit the number of patterns that can be
> registered (e.g. 5 patterns should constitute at most 5 bits). This can
> be done as an implementation recommendation.
> 
> [2. Disallow UGE for sub-contexts] This would IMHO break the google
> usecase above, would mitigate the fingerprinting risk for sub-contexts,
> would still allow sites to use this bit-wise mechanism for fingerprinting.
> 
> [3. Only allow web-wide UGE for sub-contexts] This would mitigate
> fingerprinting for site-specific UGE. Since I did not understand why we
> wanted to disallow web-wide exceptions, I do not see a downside.
> 
> 
> DISCUSSION
> - I would appreciate if we continue populating the "understanding" and
> "potential solution" perspectives before picking a given solution.
> 
> By default (unless we reach a new consensus), we will stick to the old
> consensus and will not change the spec fundamentally. If we see a risk,
> I would add a note and see what feedback we receive.
> 
> 
> Any suggestions/input/feedback/solutions?
> 
> 
> Regards,
> matthias
> 
> 
> 
> 
> On 25.08.2017 10:03, Mike O'Neill wrote:
>> Hi Shane,
>> 
>> 
>> 
>> Only some industry.
>> 
>> 
>> 
>> The trend is for subresources to not be able to set a cookies, i.e.
>> Safari, especially the new ITB which effectively creates separate silos
>> for subresource cookies, i.e. they are not web-wide. If we allow
>> web-wide consent from iframes with no check I doubt most browsers will
>> implement it.
>> 
>> 
>> 
>> It also fails the “specific” test for lawful consent under EU DP/privacy
>> law.
>> 
>> 
>> 
>> 
>> 
>> Mike
>> 
>> 
>> 
>> 
>> 
>> 
>> 
>> 
>> 
>> 
>> 
>> 
>> 
>> 
>> 
>> *From:*Shane M Wiley [mailto:wileys@oath.com]
>> *Sent:* 25 August 2017 04:45
>> *To:* Mike O'Neill <michael.oneill@baycloud.com>
>> *Cc:* Matthias Schunter (Intel Corporation) <mts-std@schunter.org>;
>> public-tracking@w3.org; Roy T. Fielding <fielding@gbiv.com>
>> *Subject:* Re: confirm and fingerprinting issues
>> 
>> 
>> 
>> Working Group,
>> 
>> 
>> 
>> This will likely be a non-starter for industry then.  We specifically
>> reviewed this concept when the UGE API was first written and everyone
>> was supportive at that time ("NAI Opt-Out Model in reverse").  I'm
>> struggling to understand why a domain can set a cookie in an iFrame but
>> we're going to restrict setting an exception...?
>> 
>> 
>> 
>> - Shane
>> 
>> 
>> 
>> On Thu, Aug 24, 2017 at 2:18 PM, Mike O'Neill
>> <michael.oneill@baycloud.com <mailto:michael.oneill@baycloud.com>> wrote:
>> 
>>    Shane.
>> 
>> 
>> 
>>    Nope, web-wide was specifically ruled out see:
>>     https://w3c.github.io/dnt/drafts/tracking-dnt.html#exception-javascript-api-store
>>    6th para before end of section
>> 
>> 
>> 
>>    “This effectively limits the API for web-wide exceptions to the
>>    single target domain of the caller”
>> 
>> 
>> 
>>    What we said Monday was the site-specific in a an iframe was
>>    possible, but probably pointless.
>> 
>> 
>> 
>>    Mike
>> 
>> 
>> 
>> 
>> 
>>    *From:*Shane M Wiley [mailto:wileys@oath.com <mailto:wileys@oath.com>]
>>    *Sent:* 24 August 2017 20:49
>>    *To:* Mike O'Neill <michael.oneill@baycloud.com
>>    <mailto:michael.oneill@baycloud.com>>
>>    *Cc:* Matthias Schunter (Intel Corporation) <mts-std@schunter.org
>>    <mailto:mts-std@schunter.org>>; public-tracking@w3.org
>>    <mailto:public-tracking@w3.org>; Roy T. Fielding <fielding@gbiv.com
>>    <mailto:fielding@gbiv.com>>
>>    *Subject:* Re: confirm and fingerprinting issues
>> 
>> 
>> 
>>    Mike,
>> 
>> 
>>    Agreed on web-wide for those scenarios but I thought we confirmed on
>>    Monday that those would work in an iFramed approach?
>> 
>> 
>> 
>>    - Shane
>> 
>> 
>> 
>>    On Thu, Aug 24, 2017 at 12:39 PM, Mike O'Neill
>>    <michael.oneill@baycloud.com <mailto:michael.oneill@baycloud.com>>
>>    wrote:
>> 
>>        Shane,
>> 
>> 
>> 
>>        I think you need the web-wide exceptions for that and they are
>>        already disallowed for iframes (unless they are for cookie rule
>>        subdomains of the top-level domain). The site-specific exception
>>        is pointless for other-origin iframes, other than to specify
>>        same-party exceptions. It just lets you set a site-specific
>>        exception for the iframe domain when it is later visited as a
>>        first party.
>> 
>> 
>> 
>>        This way means you do not even have to use the iframes, it is
>>        much faster.
>> 
>> 
>> 
>>        Mike
>> 
>> 
>> 
>>        *From:*Shane M Wiley [mailto:wileys@oath.com
>>        <mailto:wileys@oath.com>]
>>        *Sent:* 24 August 2017 19:35
>>        *To:* Mike O'Neill <michael.oneill@baycloud.com
>>        <mailto:michael.oneill@baycloud.com>>
>>        *Cc:* Matthias Schunter (Intel Corporation)
>>        <mts-std@schunter.org <mailto:mts-std@schunter.org>>;
>>        public-tracking@w3.org <mailto:public-tracking@w3.org>; Roy T.
>>        Fielding <fielding@gbiv.com <mailto:fielding@gbiv.com>>
>>        *Subject:* Re: confirm and fingerprinting issues
>> 
>> 
>> 
>>        Mike,
>> 
>> 
>> 
>>        But wouldn't this break the industry "opt-in" page concept
>>        though (similar to the current "opt-out" iFrame model)?
>> 
>> 
>> 
>>        - Shane
>> 
>> 
>> 
>>        On Thu, Aug 24, 2017 at 11:05 AM, Mike O'Neill
>>        <michael.oneill@baycloud.com
>>        <mailto:michael.oneill@baycloud.com>> wrote:
>> 
>>            While restricting the API to top-level context stops it
>>            being used by bad
>>            actors (to invisibly fingerprint), it also stops the
>>            use-case Shane has
>>            identified of being able to assign consent to multiple
>>            domains. No longer
>>            will it be possible to call the API from an iframe, so top
>>            level script will
>>            not be able to dynamically create browsing contexts that do
>>            that.
>> 
>>            I think the only way to fix the security weakness is to stop
>>            sub-resources
>>            using the API, but it is very desirable to still allow the
>>            registering of
>>            exceptions for other-origin (though same-party) domains.
>>            This will be useful
>>            not just to larger sites.
>> 
>>            I think both can be done as long as a check is made that the
>>            script-origin
>>            controls the other domains. The security and privacy benefit
>>            of disallowing
>>            subresources using the API far outweighs any threat from
>>            first-parties
>>            getting it wrong.
>> 
>>            I spent today amending the API to show how this could be
>>            specified using the
>>            same-party array:
>> 
>>            https://w3c.github.io/dnt/drafts/samepartyawareapi.html#exceptions
>> 
>>            See Section 6. It is in a new file to be web readable. It
>>            would be easy to
>>            create a PR for it against the master branch.
>> 
>>            Another possible way to check that script origins control
>>            other origins is
>>            to use CORS (or fetch) , but this adds round-trips and
>>            therefore would be
>>            slow. The same-party way will be a lot more efficient.  We
>>            could add
>>            CORS/fetch as belt and braces if people thought it necessary.
>> 
>>            Please take the time to consider this before Monday's call.
>> 
>> 
>>            Mike
>> 
>> 
>>            -----Original Message-----
>>            From: Matthias Schunter (Intel Corporation)
>>            [mailto:mts-std@schunter.org <mailto:mts-std@schunter.org>]
>>            Sent: 22 August 2017 11:59
>>            To: public-tracking@w3.org <mailto:public-tracking@w3.org>
>>            Subject: Re: confirm and fingerprinting issues
>> 
>>            Hi Mike,
>> 
>> 
>>            thanks for the clarification.
>> 
>>            I believe your resolution should substantially reduce the
>>            fingerprinting
>>            isk.
>> 
>>            Any other concerns/objections?
>> 
>> 
>>            Regards,
>>            matthias
>> 
>> 
>> 
>>            On 22.08.2017 11 <tel:22.08.2017%2011>:31, Mike O'Neill wrote:
>>> Matthias, subresources are already denied making web-wide
>>            extensions (by
>>> Roy's last change). My suggestion is to generalise his
>>            sentence to cover
>>> site-specific also.
>>> 
>>> Mike
>>> 
>>> -----Original Message-----
>>> From: Matthias Schunter (Intel Corporation)
>>            [mailto:mts-std@schunter.org <mailto:mts-std@schunter.org>]
>>> Sent: 22 August 2017 09:39
>>> To: public-tracking@w3.org <mailto:public-tracking@w3.org>
>>> Subject: Re: confirm and fingerprinting issues
>>> 
>>> Hi Mike,
>>> 
>>> thanks for the clarification.
>>> 
>>> I now (hopefully) understand: Instead of pushing an
>>            identifier as a
>>> whole (9437489), you push individual bits (bit1-0, bit2-1,
>>            bit3-1, ...).
>>> Then querying them gets efficient; only say 32 queries
>>            (one per bit)
>>> needed ;-(
>>> 
>>> Thos the "you can only query what you store" approach does
>>            not mitigate
>>> this fingerprinting risk (it is efficient to query 32 bits).
>>> 
>>> Your suggested mitigation is to disallow subresources from
>>            requesting
>>> user-granted _site-specific_ exceptions (only the main
>>            site is allowed
>>> to do so). They would still be allowed to request web-wide
>>            exceptions
>>> (where this risk does not seem to exist).
>>> 
>>> This seems to be a workable and efficient solution.
>>> 
>>> Any thoughts?
>>> 
>>> 
>>> Regards,
>>> matthias
>>> 
>>> PS: Am I right that the main site could still use
>>            site-specific UGE
>>> approach for fingerprinting? Anything we can mitigate for
>>            them?
>>> 
>>> 
>>> 
>>> On 22.08.2017 10 <tel:22.08.2017%2010>:22, Mike O'Neill wrote:
>>>> Hi Matthias,
>>>> 
>>>> That is not quite what I meant. The fingerprinting I
>>            identified would
>>> allow
>>>> the subresource to assign a random number (up to 32 bits
>>            long in my
>>>> example), because there are 32 sub-subresources (lets
>>            call them
>>>> grandchildren of the first-party site):
>>>> 
>>>> b0.images.schunter.org <http://b0.images.schunter.org>
>>>> b1.images.schunter.org <http://b1.images.schunter.org>
>>>> b2.images.schunter.org <http://b2.images.schunter.org>
>>>>                  .
>>>>                  .
>>>>                  .
>>>> B31.images.schunter.org <http://B31.images.schunter.org>
>>>> 
>>>> Each grandchild represents one bit in the 32 bit string.
>>>> 
>>>> If an exception exists for a particular grandchild, that
>>            represents a 0
>>            at
>>>> that particular bit position
>>>> Otherwise the value of the bit is 1.
>>>> 
>>>> The value of each grandchild "bit" is communicated back to
>>>> images.schunter.org <http://images.schunter.org> by each
>>            grandchild detecting its DNT header (say by
>>>> reading navigator.doNotTrack), then sending the 1 bit
>>            value in a message
>>>> using the postMessage API.
>>>> 
>>>> Then images.schunter.org <http://images.schunter.org>
>>            receives all these messages and assembles the
>>>> original 32 bit string from them.
>>>> 
>>>> Note, this does not need the confirm call, though it
>>            could. Restricting
>>> the
>>>> confirm call does not fix the risk because the same
>>            information can be
>>>> obtained via postMessage.
>>>> 
>>>> This is complicated, but it is just javascript. Once it
>>            is done it will
>>            be
>>>> easy to reproduce. It gives subresources the ability to
>>            generate UIDs
>>            even
>>>> when they are blocked from using cookies e.g. on Safari.
>>            There are
>>            already
>>>> other more complicated methods for doing this in the
>>            wild, one of the
>>>> reasons for Apple's ITB in OS11.
>>>> 
>>>> 
>>>> 
>>>> Mike
>>>> 
>>>> 
>>>> 
>>>> 
>>>> 
>>>> 
>>>> -----Original Message-----
>>>> From: Matthias Schunter (Intel Corporation)
>>            [mailto:mts-std@schunter.org <mailto:mts-std@schunter.org>]
>> 
>>>> Sent: 22 August 2017 07:44
>>>> To: Michael O'Neill <michael.oneill@btinternet.com
>>            <mailto:michael.oneill@btinternet.com>>;
>>> public-tracking@w3.org <mailto:public-tracking@w3.org>
>>>> Cc: 'Roy T. Fielding' <fielding@gbiv.com
>>            <mailto:fielding@gbiv.com>>
>>>> Subject: Re: confirm and fingerprinting issues
>>>> 
>>>> Hi Mike,
>>>> 
>>>> 
>>>> thanks a lot for the analysis of fingerprinting.
>>>> 
>>>> If I understand correctly, a sub-resource (say
>>            images.schunter.org <http://images.schunter.org>) can
>>>> obtain an exception for its
>>            "tracker7289437923.images.schunter.org
>>            <http://tracker7289437923.images.schunter.org>"
>>>> where tracker7289437923 is unique to a user for this
>>            subdomain. Since
>>>> tracker7289437923 is unique, your concern is that by
>>            learning that there
>>>> is a UGE for tracker7289437923, the site knows what user
>>            is visiting.
>>>> 
>>>> I believe that this is not a severe fingerprinting risk
>>            for the
>>>> following reason:
>>>> 
>>>> Assume that the web-site has registered a table of UGEs
>>>>  TRACKERID          NAME
>>>>  tracker7289437923  Joe
>>>>  tracker728laksdjh  Jim
>>>>  trackerk823982089  Helen
>>>>  ....
>>>> 
>>>> In theory, obtaining a line from this table allows
>>            fingerprinting.
>>>> However, our "confirm" API only allows to verify whether
>>            a single line
>>>> exists. I.e. I could indeed confirm whether I am talking
>>            to a given user:
>>>> - if confirm("tracker7289437923.images.schunter.org
>>            <http://tracker7289437923.images.schunter.org>") is true,
>>            then I am
>>>> talking to Joe.
>>>> 
>>>> However, using the scheme to fingerprint larger numbers
>>            of users seems
>>>> not really feasible: One needs to call the confirm() API
>>            once for each
>>>> subdomain that corresponds to each potential user:
>>>>  tracker7289437923
>>>>  tracker728laksdjh
>>>>  trackerk823982089
>>>>  ....
>>>> 
>>>> Ensuring this was the rationale (AFAIR) that David Signer
>>            insisted that
>>>> confirm must be called with the exact parameters of the
>>            store() call.
>>>> 
>>>> What do you think? If we agree that there is still a
>>            larger risk, we
>>>> should investigate your potential resolution (which I
>>            have not checked
>>>> in detail yet; since I am not 100% sure I see the risk).
>>>> 
>>>> Any feedback is welcome!
>>>> 
>>>> matthias
>>>> 
>>>> 
>>>> 
>>>> 
>>>> On 21.08.2017 21 <tel:21.08.2017%2021>:19, Michael
>>            O'Neill wrote:
>>>>> I think the web-wide issue is fine with Roy's sentence:
>>>>> 
>>>>> For each of the targets in a web-wide exception, a user
>>            agent must not
>>>> store
>>>>> the duplets and must reject the promise with a
>>            DOMException named
>>>>> "SecurityError" unless the target domain matches both the
>>>> document.domain of
>>>>> the script's responsible document and the
>>            document.domain of the
>>>> top-level
>>>>> browsing context's active document [HTML5]. This
>>            effectively limits the
>>>> API
>>>>> for web-wide exceptions to the single target domain of
>>            the caller.
>>>>> 
>>>>> This limits web-wide consent to the top-level browsing
>>            context which was
>>>> how
>>>>> it always was supposed to be.
>>>>> 
>>>>> But as the text is now, a subresource browsing context
>>            (aka an iframe)
>>>> can
>>>>> still specify a site-specific exception for itself and
>>            its own set of
>>>>> targets. This could be a danger because it allows a
>>            third-party
>>>> subresource
>>>>> to invisibly create arbitrary exceptions for itself,
>>            which it can then
>>>> use
>>>>> to fingerprint the user agent. It would do this by
>>            creating  a set of
>>>>> subresource iframes and establishing a UGEs for a random
>>            set of them.
>>>>> 
>>>>> For example, subresorce.com <http://subresorce.com>
>>            loads 32 child  iframes b0.subresource.com
>>            <http://b0.subresource.com>,
>>>>> b1.subresource.com <http://b1.subresource.com>, ...,
>>            b31.subresource.com <http://b31.subresource.com>.
>>>>> 
>>>>> When it exists as a subresource on top-level site
>>            example.com <http://example.com> for user
>>>> Alice
>>>>> it creates a UGE for targets bX.subresource.com
>>            <http://bX.subresource.com>, bY.subresource.com
>>            <http://bY.subresource.com>,
>>>> ...,
>>>>> bZ.subresource.com <http://bZ.subresource.com> . i.e. a
>>            random 32 bit pattern unique to Alice.
>>>>> 
>>>>> When Alice later revisits example.com
>>            <http://example.com> DNT:0 will be sent in requests for
>>>> the
>>>>> subset of targets specified in the UGE. These
>>            subresources can then
>>>>> communicate back to the parent subresource the value of
>>            DNT they have
>>>>> received, using the postMessage API. Thus
>>            subresource.com <http://subresource.com> can recognise
>>>>> Alice without having to place a third-party cookie. It
>>            cannot do this
>>>> for
>>>>> sites other than example.com <http://example.com>, but
>>            it is still a privacy risk.
>>>>> 
>>>>> We do not have a use case for a subresource initiated
>>            site-specific UGE,
>>>> so
>>>>> why do we need it? the easiest way to fix this is simply
>>            to adopt Roy's
>>>>> wording for all UGEs, not just web-wide ones.
>>>>> 
>>>>> For the other issue, making the confirm call (now called
>>>>> Navigator.trackingExceptionExists) capable of confirming
>>            exceptions for
>>>>> cookie rule subdomains as
>>            Navigator.storeTrackingException does, I
>>>> suggest
>>>>> the following derived from Roy's definition of "site" for
>>>>> storeTrackingException, with a lone "*" illegal:
>>>>> 
>>>>> site
>>>>> The referring domain scope where an exception should be
>>            confirmed:
>>>>> If site is undefined, null, or the empty string, the
>>            referring domain
>>>> scope
>>>>> defaults to the [site domain].
>>>>> Otherwise, the referring domain scope is defined by a
>>            domain found in
>>>> site
>>>>> that is treated in the same way as the domain parameter
>>            to cookies
>>>>> [RFC6265], allowing subdomains to be included with the
>>            prefix "*.". The
>>>>> value can be set to a fully-qualified right-hand segment
>>            of the document
>>>>> host name, up to one level below TLD. If such a domain
>>            scope cannot be
>>>>> parsed then the user agent must reject the promise with
>>            the DOMException
>>>>> named "SecurityError"
>>>>> 
>>>>> Comments?
>>>>> 
>>>>> Mike
>>>>> 
>>>>> 
>>>>> 
>>>>> 
>>>> 
>>>> 
>>>> 
>>> 
>>> 
>>> 
>> 
>> 
>> 
>> 
>> 
>>        -- 
>> 
>>        - Shane
>> 
>> 
>> 
>>        Shane Wiley
>> 
>>        VP, Privacy
>> 
>>        Oath: A Verizon Company
>> 
>> 
>> 
>> 
>> 
>>    -- 
>> 
>>    - Shane
>> 
>> 
>> 
>>    Shane Wiley
>> 
>>    VP, Privacy
>> 
>>    Oath: A Verizon Company
>> 
>> 
>> 
>> 
>> 
>> -- 
>> 
>> - Shane
>> 
>> 
>> 
>> Shane Wiley
>> 
>> VP, Privacy
>> 
>> Oath: A Verizon Company

Dave Singer

singer@mac.com
Received on Monday, 28 August 2017 15:53:26 UTC