Re: Draft Text on First Parties and Third Parties (ACTION-34, ISSUE-10, ISSUE-26, ISSUE-88) from Jonathan Mayer on 2012-01-06 (public-tracking@w3.org from January 2012)

From: Jonathan Mayer <jmayer@stanford.edu>
Date: Thu, 5 Jan 2012 17:38:25 -0800
To: Heather West <heatherwest@google.com>
Cc: Justin Brookman <justin@cdt.org>, public-tracking@w3.org
Message-Id: <8EA90575-227C-4C01-82F7-2440DAA1B673@stanford.edu>
On Jan 5, 2012, at 2:39 PM, Heather West wrote:

> From our perspective, we have a several issues with this latest draft as it stands, and no, we don't think it's workable - we need to make sure, as a group, that the language is clear and implementable if we hope to see any adoption of the standard. The current draft allows for enough vagueness that evolving and contradictory interpretations would be possible across multiple regulatory environments. 

Which parts of the text do you find vague?  We attempted to draft it quite tightly.

>  The initial version of this issue language was short and easy to understand, and I think that's one of the reasons that we all liked it conceptually. This is long, hard to understand and open to multiple interpretations.

There are only six sentences of operative text in the draft.  Here they are, broken out:

A "party" is any commercial, nonprofit, or governmental organization, a subsidiary or unit of such an organization, or a person, that an ordinary user would perceive to be a discrete entity for purposes of information collection and sharing. Domain names, branding, and corporate ownership may contribute to, but are not necessarily determinative of, user perceptions of whether two parties are distinct.

A "network interaction" is an HTTP request and response, or any other set of logically related network traffic.

A "first party" is any party, in a specific network interaction, that can infer with high probability that the user knowingly and intentionally communicated with it. Otherwise, a party is a third party.

A "third party" is any party, in a specific network interaction, that cannot infer with high probability that the user knowingly and intentionally communicated with it.

I'm really having difficulty seeing what's "long," "hard to understand," or "open to multiple interpretations."  Especially relative to most other proposals that have been made, including the online advertising industry self-regulatory principles.

> It also draws in many other active issues (definitional and otherwise)

I don't follow.  Tom and I addressed the ISSUEs we were tasked with covering - no more.

> and takes them in directions other than the one that our original discussions indicated.

Also don't follow.  This seemed to us a straightforward implementation of the "user expectations" test.

> An objective standard with respect to what a first party is critical, because companies and individuals who adopt this standard publicly are rightly expected by both the general public and regulators to do what they say. But the potential for evolving and variant interpretations of user perception and common branding make it unclear what is being signed up for exactly. I think we need a first party definition that is based on ownership (and being adequately clear in disclosing that ownership, whether in a privacy policy or in branding/logo/etc). This is an objective standard that allows websites to clearly understand what they are signing up for when they adopt DNT.

My understanding from Santa Clara and after was that there was a near-consensus against a corporate ownership/control/affiliation test.  Tom and I articulated some of the reasons in our draft.  There's further discussion in the email threads ""Proposed First Party definition" and "Summary of First Party vs. Third Party Tests."

>  User perception is useful to think about and certainly should impact the way that we approach the spec, but it's unworkable to ask companies, developers, and hobbyists to work based on a spec that is this subjective. Does this mean that a website consisting solely of python coding resources is evaluated on a different standard than a porn site, simply because their 'average user' is different?

We drafted the text for a site-by-site user audience.  The standard could instead specify the Internet as a whole, a specific geography (e.g. the country where the site is located), or any other subdivision.

But I don't think the distinction matters in practice.  The overwhelming majority of use cases remain very clear.  It doesn't matter whether you're StackOverflow or Playboy - users don't expect to share data with wholly independent advertising, analytics, and social services.

> And do we have any kind of indication that users do or don't understand the things we’re talking about? Perception and intention are vague and subjective. 

There have been quite a few academic and industry studies of what users understand about third-party web services.  Aleecia and others who have done this research - mind sharing a brief overview?

> I’m also a bit worried that the requirement for prominent branding for diverse companies - think news websites -- might be required to co-brand themselves with all the other news sites in the network - increasing consumer confusion. When you’re on Flickr it may be clear to the user that Yahoo uses data from Flickr as a first party, but when you’re on Yahoo, do you need to prominently co-brand the site as Flickr too? And what if you also brand the site with a third party logo? Do they become first parties? 

These are among the reasons I do not favor a branding test.

> An additional use case that illustrates the compleities involved is URL shorteners, assume that the user clicked on the shortened link. Why don't they expect that they are interacting with that party, ie the link shortener? What about the goo.gl link shortener - is that branded in such a way that they know they are interacting with Google, even though that’s not where they end up? How exactly do you assume that bit.ly users don’t interact with bit.ly? There are several scenarios here - the user does or doesn’t see the URL, the user clicks on a link that is directed to the shortened URL which either does or doesn’t indicate that it’s a shortened link (but is likely to indicate the final destination of the shortened link). These either need to be fleshed out or we need to decide how to deal with shorteners as first parties.

Tom and I did not address URL shorteners in our draft.  ("ISSUE-97: A special rule for URL-shortening services remains an open issue and is not addressed in this proposal.")  I would support Justin's proposal for noting that a URL shortener is, in general, a third party.  If that's a point of controversy, then let's keep the separate ISSUE and hold it for later.

> How would a restriction on URL shorteners as redirection impact sites (often news sites) that redirect a human-readable URL (newssite.com/that-thing-happened/) to a machine readable URL (newssite.com/423dfsgs59/ from a legacy CMS?

I don't understand this example.  For both URLS, News Site would be a first party.

> Finally, on the topic of mash-ups, I think the mashup idea needs to be fleshed out and accounted for, simply because the incorporation of content on websites is common and useful. Even if it makes up a small percentage of web traffic today, this is an area of innovation that will probably increase greatly over time.

While I'm skeptical that mashups will become much more common, I completely agree that we should address them.  Like URL shorteners, if they're a point of controversy, let's mark an ISSUE and hold it until we settle the far more frequent use cases.

> If Google, for example, wants to be DNT compliant, we need to account for this in the context of Google Reader.

Could you explain what you mean?  If a user visits Google Reader, it's a first party.  Under the proposals that have been advanced, Google would not be responsible for third-party RSS content (and whatever's embedded in it).

> And the many blogs I read are (many of whom have analytics and/or share buttons) by and large going to assume that they are first party, without concerning themselves with whether or not their content is being consumed via an aggregator. Figuring out where aggregators fit into this is key, and we should either say that a content feed that is proactively added by the user with the understanding that it will appear on the first party site (like Reader) is first party content, or that the first party is not responsible for the content of the page. 

I see two analytical approaches to news aggregators: 1) treat them as a possible multiple first party scenario, or 2) consider them a species of the "what the hell, someone went and embedded all my content" problem discussed on yesterday's call.

Whatever the analytical approach, and whatever the result, I'm not particularly concerned about the privacy implications.  At most a news site and its embedded content learn one additional fact about a user - that they use a news aggregator.

I'd propose that, like for URL shorteners and mashups, we take the group's temperature on how to treat news aggregators.  If there's not consensus, let's create a new ISSUE and reserve it for later.  We shouldn't delay consensus on the major issues over a few edge cases.

> Heather (and Sean)
> 
> On Thu, Jan 5, 2012 at 11:33 AM, Justin Brookman <justin@cdt.org> wrote:
> I would revise the definition of first party to "A first party is, in a specific network interaction, the operator of the domain with which the user intended to communicate."  I would remove the entire section about multiple first parties as I do not believe a realistic example has been presented where that would ever be the case.  In the example of the craigslist/Google Maps mashup, whichever of the two is the actual operator of the domain should be the first party and the other would be the third party (or, if an entirely different entity operates the mashup, as appears to be the case at HousingMaps.com, the operator of HousingMaps is the first party and craigslist and Google are third parties if they're present at all).  Third parties can still become first parties if their content is clearly branded and a user meaningfully interacts with the content.  Writing a spec for the extreme and unprecedented edge case facebookandmoviefonebothrunthisdomain.com will cause more uncertainty and invite abuse while not solving an actual problem.  Domains have one operator; until co-registration becomes an option, sticking with one first party makes sense.
> 
> I like David's proposed counterexample to 4.1(a).  I believe my above suggestion should take the place of his counterexample to 4.1(b) (though both are designed to achieve the same goal).
> 
> On the call, we seemed to agree that it should be a necessary condition for an entity to be under common corporate control as the site operator in order to be a first party (or a third party who gets permission to track).  Thus, I would revise the definition of party to: "A 'party' is any person or commercial, nonprofit, or governmental organization, as well as any person or organization that operates under the same corporate or governmental control as the party and [discoverability/branding/user perception --- whatever test we use]."
> 
> I will again make the argument that branding seems the more reasonable and concrete test here, and will provide the most certainty for users and companies, but I await Shane's pitch for why discoverability is sufficiently clear to users (or Jonathan's counterpitch on why "user perception" is sufficiently workable).
> 
> I would also add .url shortener services as a specific example of a third party with which the user was not intending to communicate.
> Justin Brookman
> Director, Consumer Privacy Project
> Center for Democracy & Technology
> 1634 I Street NW, Suite 1100
> Washington, DC 20006
> tel 202.407.8812
> fax 202.637.0969
> justin@cdt.org
> http://www.cdt.org
> @CenDemTech
> @JustinBrookman
> 
> On 1/4/2012 6:51 PM, Jonathan Robert Mayer wrote:
>> 
>> Understood. I took my own notes, and we'll work from the minutes. If others would like to write up their proposed changes, that would be most helpful,
>> 
>> Jonathan
>> 
>> On Jan 4, 2012, at 3:46 PM, David Singer <singer@apple.com> wrote:
>> 
>>> To be clear, I only provide the edits I personally suggested;  I think all of us were asked to be precise about what we were suggesting, and I didn't do anyone else's suggestions.
>>> 
>>> On Jan 4, 2012, at 15:42 , Jonathan Robert Mayer wrote:
>>> 
>>>> Thanks for taking notes. Tom and I will revise the text to incorporate what we heard on today's call. Much of the focus was on the edge cases of mashups and inadvertantly embedded content - which                       strongly suggests to me that we're very close to consensus.
>>>> 
>>>> The two outstanding high-level concerns that I recall are:
>>>> 
>>>> 1) Are the standards we provide workable in practice? I believe close calls will be very rare, and only companies gaming the margin would have to consider surveying users. Heather was less sure. Heather, could you suggest a few common use cases that lead to a difficult analysis under the draft's standards?
>>>> 
>>>> 2) Shane suggested (and a few supported) moving to a user-is-able-to-discover-information standard for what's a party and what's a first or third party. Shane, could you briefly sketch what this standard might look like and give a few examples where it would work a different result from our user expectations standard?
>>>> 
>>>> Jonathan
>>>> 
>>>> On Jan 4, 2012, at 1:27 PM, David Singer <singer@apple.com> wrote:
>>>> 
>>>>> Here are my comments/suggestions, after this morning's call.
>>>>> 
>>>>> 1) section 2.1.  Make clear that the user is a party, or specifically say that the definition defines parties that may be 1st or 3rd.
>>>>>   also raise an issue for a clear definition of what falls into the 2nd party?? (e.g. software or other agents acting on the user's behalf??)
>>>>> 
>>>>> 2) section 2.1.  Consider adding the condition that two separate legal entities cannot be considered a single party (in our context).
>>>>> 
>>>>> 3) section 2.1.  Add an issue that we may want to strengthen the definition to the point where it is testable.
>>>>> 
>>>>> 4) section 4.1.  Make the definitions of what is a 1st party a list of conditions, all of which apply.
>>>>> 
>>>>> 5) section 4.1.  Add to the list of conditions:
>>>>>   a) the user must be directly aware of the existence and identity of a separate entity, prior to their interaction.
>>>>>   b) the user's makes an independent choice to communicate/interact with the entity.
>>>>> 
>>>>> Counter-examples to (a) are a weather or other widget with no obvious branding or other evidence to show it came from another organization or entity; the user is not aware of a separate identity behind it.
>>>>> Counter-examples to (b) are where sites are mash-ups of unpredictable sources; the user, by visiting the mash-up, chose only the mashing site as the first party; until the user interacts further, the mashed sites are third parties (and rule (a) applies as well - the user must be aware that they are mashed in, and not sourced by the mashing site).
>>>>> 
>>>>> 
>>>>> 
>>>>> 
>>>>> On Dec 22, 2011, at 15:25 , Jonathan Mayer wrote:
>>>>> 
>>>>>> Tom and I have worked for several weeks on a comprehensive draft of the sections delineating first parties and third parties.  We attempted to reflect the approaching-consensus discussion at Santa Clara and on the email list.  Our draft includes both operative standards language and non-normative explanation and examples.  The text is formatted with the W3C template to better resemble how it would appear in the final document; please note that this is not an Editor's Draft (as the template might suggest).
>>>>>> 
>>>>>> Jonathan
>>>>>> 
>>>>>> <parties-draft-jm-tl.html>
>>>>>> 
>>>>> 
>>>>> David Singer
>>>>> Multimedia and Software Standards, Apple Inc.
>>>>> 
>>>> 
>>> 
>>> David Singer
>>> Multimedia and Software Standards, Apple Inc.
>>> 
> 
> 
> 
> -- 
>
Received on Friday, 6 January 2012 01:38:58 UTC