Re: A First-Party List API for Site-Specific Exceptions (ISSUE-59, ISSUE-109, ISSUE-111, ISSUE-113, ISSUE-114) from Sid Stamm on 2012-03-14 (public-tracking@w3.org from March 2012)

From: Sid Stamm <sid@mozilla.com>
Date: Wed, 14 Mar 2012 15:51:15 -0700
To: Jonathan Mayer <jmayer@stanford.edu>
CC: Kevin Smith <kevsmith@adobe.com>, Tracking Protection Working Group WG <public-tracking@w3.org>
Message-ID: <4F6120E3.8010407@mozilla.com>
Hey Jonathan,

On 3/14/12 3:15 PM, Jonathan Mayer wrote:
> Here are a few concrete use cases:
> 
> 1) A publisher wants to distinguish itself as having good privacy
> practices.  Instead of requesting a blanket exception, it wants to
> request exceptions for a small number of well-known third parties.

This use case isn't directly relevant.  It identifies a need for
site-by-site exception *requests*, but not a list-based exception status
API.

> 2) A publisher wants to reflow its layout to exclude widgets and SSO
> providers that the user has not excepted.  (This was Ian Fette's
> example.)

This is a good use case.  Can we address it with multiple calls to the
request API instead of "gimme all exceptions"?

> 3) If certain web-wide exceptions are present, a publisher does not
> need to ask for new exceptions.

This isn't really a use case but rather a side-effect or desired
function.  Regardless, it identifies a desire for user agents to avoid
duplicate prompts to users (UI design) and doesn't require us to expose
a list to JS.  We could put non-normative text in there to suggest user
agents should shortcut publishers' requests that are unnecessary due to
web-wide exceptions, but the same info can be obtained from a non-list API.

> 4) A user wants to blacklist a third party in a manner that trumps a
> site-specific exception.

Again, I think we should let the browser UI deal with this.  This is
user-centric, not site-centric so we shouldn't address it on the site-side.

> In all of these scenarios, a first party needs to know the exception
> status for specific third parties.  There are, broadly, four
> proposals for how we might facilitate that:
> 
> a) polling with existing web technology (e.g. cached JavaScript with
> a Vary: DNT response header),
> 
> b) polling with requestSiteSpecificTrackingException (asks for the
> exception if not already granted or denied),
> 
> c) polling with a dedicated API (e.g.
> testSiteSpecificTrackingException), and
> 
> d) getting a list from a dedicated API (e.g.
> siteSpecificTrackingExceptions).
> 
> Polling approaches, especially (a), will be slower.  

Javascript is fast.  We're talking realistically, maybe a thousand API
calls.  This won't be noticeable for implementations that don't block on
user input, or a user agent can roll them all up into one prompt batch
to speed it up.

> A list approach brings greater first-party fingerprinting risk.

Yep.

> (With polling, a first party would have to enumerate domains; that's
> more difficult, slower, more easily detected, and can be mitigated
> through frequency capping.) 

Yep.

> First-party fingerprinting risk in (c) and (d) could be greatly
> mitigated by requiring user consent for the API (e.g. "This website
> would like to learn about your tracking preferences for other
> websites. Allow/Deny").

Okay, I see.  That makes good sense.  Would sites still use this API if
there were a wait-for-user barrier there like this?  (I guess I'm asking
the web site operators out there, not you directly Jonathan...)

-Sid

> 
> On Mar 14, 2012, at 11:49 AM, Sid Stamm wrote:
> 
>> I agree with Kevin that we should identify precisely the set of
>> use cases we're looking to address by resurrecting this idea.
>> 
>> But I still have fingerprinting concerns with the proposal (please
>> help me understand if I'm wrong).  Even if it's limited to 
>> first-party-context scripts, there's still the ability for
>> first-party sites to fingerprint using this list-based thing.
>> 
>> The adversaries in my fingerprinting concern are not the sites
>> trying to do the right thing with the suite of tools we're
>> developing here, but rather *other* sites that have no intention of
>> being friendly.  I know our work here is not intended to address
>> "bad actors", but we should not develop new technologies that make
>> it easier for bad actors to act badly.
>> 
>> I'm worried about shadyfirstpartysite.com re-identifying me based
>> on things like my browsing history between visits to their site.
>> I'm concerned that a first-party list-based API will provide 
>> shadyfirstpartysite.com with more bits of entropy that make it
>> easier to re-identify me across multiple visits to their site.
>> 
>> I'm sure you will argue that it's already trivial to fingerprint
>> users if you're a "bad actor", and I concede that point, but if
>> we're going to make it easier (or a more statistically-strong
>> confidence), we better have some really good use cases that we
>> cannot get from other, similar features that don't carry the same
>> fingerprinting "enhancement".
>> 
>> -Sid
>> 
>> On 3/14/12 11:38 AM, Kevin Smith wrote:
>>> Before we resurrect this topic, can we please address the fact
>>> that it would only work in a very minimal set of use cases and
>>> make sure it is worth our time.
>>> 
>>> This API only has value if we allow partial exceptions (to some
>>> but not all 3rd parties on a site).  As I have stated in other
>>> threads, there are many many problems with allowing partial
>>> exceptions.  There are also real privacy and business incentives
>>> to do so, and so it may be worth trying to work through some of
>>> the problems.  However, there is a single core problem that would
>>> need to be solved prior to considering any of the additional work
>>> which is:  How do you request exceptions for a potentially ever
>>> changing, dynamic, unknown list of 3rd parties which make up
>>> common advertising chains.
>>> 
>>> The cnn example below simply does not work.  It is worthless to
>>> grant doubleclick (more accurately DFP) an exception because
>>> doubleclick usually does not serve the ads.  The publisher's ad
>>> server's responsibility is to find the ads which will yield the
>>> highest CPM for the publisher.  The process of doing so may
>>> involve several different services and redirects, and these
>>> services may change per user, per page or even per time of day.
>>> Granting doubleclick the ability to track does not grant them the
>>> ability to perform their task unless the other steps in the chain
>>> also have an exception.
>>> 
>>> I am sure there are simple scenarios in which the 1st party will
>>> know all 3rd parties and may be willing to offer variable content
>>> based on which 3rd parties have exceptions.  I think before we
>>> talk about changing the functionality we should list some of
>>> these scenarios and determine whether it is worth our time or the
>>> browser's time to implement the API.
>>> 
>>> On a side note - if we can satisfactorily answer the above
>>> questions and feel that we do want to move forward with it, I
>>> agree with Jonathon that we can do so in a way which does not
>>> carry strong fingerprinting risks.  I think Jonathon's proposal
>>> would work.
>>> 
>>> -kevin
>>> 
>>> -----Original Message----- From: Jonathan Mayer 
>>> [mailto:jmayer@stanford.edu] Sent: Wednesday, March 14, 2012
>>> 11:42 AM To: Tracking Protection Working Group WG Subject: A
>>> First-Party List API for Site-Specific Exceptions (ISSUE-59,
>>> ISSUE-109, ISSUE-111, ISSUE-113, ISSUE-114)
>>> 
>>> There have been renewed expressions of interest in a first-party
>>> API that lists which third parties have an exception.  Rationales
>>> include making it easy and fast for a first party to learn about
>>> exceptions (relative to alternatives that use existing web
>>> technology) and facilitating non-blanket exceptions and possibly
>>> third-party blacklisting by users.
>>> 
>>> The proposed siteSpecificTrackingExceptions API did just this; it
>>> was removed out of concern for third-party fingerprinting.  I'd
>>> like to bring back the proposal, but limit access to script
>>> running in the context of a top-level (first-party) origin.
>>> Browsers already have the necessary building blocks; they know
>>> the top-level origin and the context origin associated with
>>> script execution.
>>> 
>>> Here's a concrete example: A CNN visitor has granted a web-wide 
>>> exception to Google ("*", "doubleclick.net"), but nobody else.
>>> When a cnn.com script calls
>>> navigator.siteSpecificTrackingExceptions, it gets back
>>> ["doubleclick.net"].  If a yahoo.com script calls 
>>> navigator.siteSpecificTrackingExceptions, it gets no return
>>> value, and an error is thrown.
>>> 
>>> I don't see much marginal privacy risk to including this API.
>>> Third parties already have countless stateful and stateless ways
>>> of tracking a browser, including with scripts running in the
>>> context of both their own and first-party origins.  This is just
>>> another API for compliance monitors to keep tabs on.
>>> 
>>> As for a first party handing a fingerprintable exception list to
>>> a third party, the same reasoning applies.  A first party could
>>> just as easily pass a unique identifier (e.g. a first-party ID
>>> cookie).  We should be clear in the document that the usual
>>> limits on first parties handing data to third parties apply.
>>> 
>>> 
>>> 
> 
>
Received on Wednesday, 14 March 2012 22:51:44 UTC