Re: [Beacon] Off-line behavior, captive portals and error recovery

On 15.05.2014, at 14:21, Arvind Jain <arvind@google.com> wrote:

> Re. captive portal detection, I think that is out of scope. Captive portals behave in many different ways and putting something in this spec about that would be odd. If a user agent is able to detect reliably it is not "really" connected and is behind a captive portal, it should not send the beacons. It doesn't seem like we need to spell it out in this spec.

I don't find that odd at all. It's a data loss risk and it would be good for the spec to address it or at very least acknowledge such failure scenario in a non-normative section.

sendBeacon is also in a unique situation, because by design it doesn't allow authors to handle errors. With XHR or images authors can catch errors or unexpected server responses and react accordingly, so details of error handling can be left to the authors. But with sendBeacon the responsibility for good error handling lies solely with the UA, so IMHO it must be defined in the spec and can't be just hand-waved as "best effort".

The other spec that I know where UA is given responsibility for error handling is Server-Sent events:

http://www.whatwg.org/specs/web-apps/current-work/multipage/comms.html#processing-model-8


SSE's error handling can't be reused for this spec, since they have different goals, so here's my suggestion for the sendBeacon:

- HTTP 305 Use Proxy, 401 Unauthorized, and 407 Proxy Authentication Required should be treated transparently as for any other subresource.

- Other 4xx HTTP responses should be treated as a permanent error and the UA must not retry sending the beacon (If the script gathering analytics has been deleted the UA shouldn't be retrying unnecessarily and it would make sense for authors to reject requests with status 403/410 without having to pretend that the request was accepted.)

- 5xx HTTP responses should be treated as a temporary error and the UA should retry(this often includes proxy and load balancer errors that are temporary.)

- DNS lookup errors and TCP/IP connection errors should be treated as temporary errors and the UA should retry. First version of Server-Sent events spec made a mistake of defining DNS errors as a permanent failure, but in practice the DNS can be flaky when the device has a weak Internet connection.

- 3xx HTTP responses should be treated as a temporary error and the UA should retry. The RFC 2616 says UAs MUST NOT automatically do POST after 301/302/303/307/308 responses, and sendBeacon doesn't have any use for the subsequent GET request, so there is no need for authors to ever use any of these statuses. However, this simple rule will catch captive portals that redirect all unknown URLs to portal's homepage (this isn't precluding more sophisticated captive portal detection by the browser or the OS, but at least gives some baseline).


By "UA should retry" I mean the UA should wait a period of time and try making the request again. Time before retries should increase exponentially, and of course UAs should be free to schedule retry when convenient, e.g. never wake cellular radio for these requests.

UAs should persist beacons across restarts. This is important on mobile where the browser can be killed at any time. To minimize bias in analytics, lifetime of the beacon should be defined in the spec, rather than being left undefined and implementation-dependent, which may vary wildly with amount of memory in devices and operating systems' approach to multitasking.

At this point I don't have specific suggestions how long UAs should keep beacons for. It could be bikeshedded or derived from some data, e.g. by looking how long users typically remain offline.

-- 
regards, Kornel

Received on Friday, 16 May 2014 23:51:19 UTC