[Beacon] Last Call comments re: privacy and editorial suggestions from Nicholas Doty on 2014-07-29 (public-web-perf@w3.org from July 2014)

From: Nicholas Doty <npdoty@w3.org>
Date: Tue, 29 Jul 2014 15:39:12 -0700
To: public-web-perf@w3.org
Message-Id: <B329E3DE-9BAD-4AEF-8973-95F73F335391@w3.org>

Hi Web Performance Working Group,

I wanted to provide a few comments on the Beacon API during your Last Call review. The comments below were shared with and informed by discussions among the Privacy Interest Group, but please assume that any mistakes or misunderstandings are mine entirely.

I'd be happy to talk further about these questions and comments if that would be useful for you all. And I know the Privacy Interest Group [1] folks have been interested in conversations with the Web Perf WG in the past and might provide further expertise.

Thanks,
Nick

[1] http://www.w3.org/Privacy

## must honor headers?

> User agents MUST honor the HTTP headers (including, in particular, redirects and HTTP cookie headers),

This seems to be new in this version of the spec and I don't understand the reasoning behind it. Why MUST user agents honor all response headers? If (as I believe most user agents do) a user agent typically ignores Set-Cookie headers from different origins, is that user agent non-conformant with Beacon? This requirement seems unlikely to be followed, as it would introduce privacy risks.

## security considerations and CORS

What are the security considerations of this document? Is there an origin-restriction on the POST URL? Should one be recommended? Does making background POST requests to other origins including sending credentials provide an increased risk of CSRF attacks? (Maybe this risk is identical to the existing risk of submitting POST forms to other origins.) Are cross-origin POST requests with credentials necessary to satisfy the purpose of the Beacon specification? If not, why add the attack surface? I understand the group has already discussed using POST vs. GET, even though this is a request that may be repeated under error conditions. But use of POST also expands the methods attackers have for conducting CSRF attacks, since many server operations will require POST.

The CORS specification is listed in the References, but doesn't seem to be referred to in the text of the specification. Are user agents intended to follow the CORS cross-origin request model when making a beacon request to a different origin? If so, is preflight required because of the non-simple Beacon-Age header?

If you haven't already, I suspect it would be worthwhile to follow up with the Web Security Interest Group or the Web Application Security Group to check with them about the potential CSRF threat and the use of CORS.

## privacy considerations

What are the privacy considerations of this document? For example, do users want or expect their agents to communicate data after they leave a page or close a window? Does the API give users control over this functionality or will that be handled by UA/site implementations outside the protocol? Is there a recommendation for how this data is handled when a user has toggled a private browsing mode?

Perhaps more specifically: will users be able to inspect requests made after a page is unloaded? For this and other instances (perhaps this would also apply to some APIs around service workers) where UAs make requests not associated with a visible window or tab, what guidance can we give implementers on enabling transparency or control?

Perhaps a section dedicated to privacy and security considerations would be helpful.

## editorial comments

Some requirements are placed on "the User Agent" and others on "user agents"; consistency would be better.

Sections 1 and 4.1 (both Introductions) seem duplicative. In both cases, this sentence is first:
> The Beacon specification defines an interface that web developers can use to asynchronously transfer small HTTP data from the User Agent to a web server.

Nothing in the specification limits the size of the data sent. In fact, analytics data aggregated over an entire session (since unloading seems like the primary use case) might be quite large. If the purpose is specific to small amounts of data -- which might itself address the basic privacy principle of data minimization -- then those requirements should be specified so that implementations have the same restrictions and sites are aware of them.

I think it would be more correct to refer to transferring data via HTTP rather than "HTTP data".

Web developers already have interfaces for asynchronously transferrring data via HTTP. For example, XMLHttpRequest, as you note. Perhaps a better summary would be: "This specification defines an interface that web developers can use to asynchronously transfer data from the user agent to a web server during or after the unloading of a page."

Received on Tuesday, 29 July 2014 22:39:22 UTC