[whatwg/fetch] Specify restriction for requests with keepalive set (#679)

See also: #662 

## Problem
We, Chrome, shipped the keepalive flag in fetch API, with a non-interoperable restrictions:

- If the renderer process is processing more than 9 requests with keepalive set, we reject a new request with keepalive set initiated by fetch().
- If the renderer process is processing more than 19 requests with keepalive set, we reject a new request with keepalive set.
  - Note: The difference between fetch() + keepalive vs. SendBeacon is from a historical reason and we are going to have a unified restriction.
- If Chrome is processing more than 255 requests with keepalive set, we reject a new request with keepalive set.

These restrictions are very much visible to web developers (making 10 fetch requests is very easy), so we would like to replace them with an interoperable ones, if possible.

### What’s special about keepalive?

Chrome is a multi-tab applications, and many resources (CPU, memory, network, …) are associated with tab. For example, it is easy to create a non-responding web application like this.

```
<html>
<head>
<script>while(true) {}</script>
</html>
```

But a user can free up the CPU resource by simply closing the tab without impacting other web pages (or tabs) negatively. That is also true for network resources. Web pages can issue many network requests, but they will be cancelled when the page is unloaded or killed forcibly.

Requests with keepalive set is an exception. As [stated in the fetch spec](https://fetch.spec.whatwg.org/#concept-fetch-group-terminate), requests with keepalive set survive page unload. This is a kind of resource leak from the above POV; the request is still ongoing while the tab is closed, and we don’t have a means to abort it except for shutting down the entire browser.

### Mitigations
Chrome has some mitigations for the problem.

#### Upload payload size [specced]
As written in the spec, we only allow 64KiB payload per fetch group at a time. Unfortunately, this restriction is easy to escape.

#### Timeout [chrome only]
Requests with keepalive set will be cancelled when 30 seconds passed since the associated context (mostly frame) is destroyed.

#### Number of requests per process [chrome only]
The number of requests with keepalive set per render process is restricted in Chrome. This restriction is hard to escape but we cannot have is in the spec because “renderer process” is Chrome-only concept. We also have some uncertainty because sometimes a renderer process can be shared.

#### Number of requests per browser [chrome only]
The number of requests with keepalive set is restricted in Chrome. This restriction is impossible to escape as long as you are using the browser.

#### Abort request after the response is received [chrome only]
Chrome aborts a request with keepalive when both of the following hold:
The associated context has already been destroyed.
The response has already been received.
This prevents mass download after the tab is closed.

### Interoperability
I think having interoperability here is particularly good from two reasons.

One is to have a unified policy. Given the leaky nature of the flag, each implementer has to make a balanced decision between developers’ convenience and users’ expectation. Having diverse policies will confuse both developers and users.

The other is the difficulty to handle errors. keepalive flag is expected to be used when the page is about to unload. In such a circumstance developers are not likely to be able to detect and handle errors correctly. Having a rigid, interoperable restrictions will be developers’ benefit.

## Proposal

The policy we would have should have the following properties:

- It is difficult for (evil) web developers to circumvent the restrictions without being noticed by end users.
- It should be aligned with reasonable end user expectations.

Here I propose the following:

1. Introduce KeepaliveContext which keeps track of the inflight requests with keepalive set.
   - A unit of related browsing contexts share a KeepaliveContext.
   - A dedicated worker shares the KeepaliveContext with its responsible browsing context.
   - A shared Worker shares the KeepaliveContext with its responsible browsing context.
   - Service worker: TBD; see below.
1. Restrict the number of inflight request with keepalive set asynchronously in a KeepaliveContext. Put a number that all implementations should allow.
1. Restrict the total number of inflight body size with keepalive set asynchronously in a KeepaliveContext. Note that this changes https://w3c.github.io/beacon/#sec-processing-model which contains a synchronous check.
1. Cancel a request with keepalive set when a certain time period passed from the fetch group termination.
1. Cancel a request with keepalive when both of the following hold:
   - The associated fetch group has already been terminated.
   - The response has already been received.

### Service Worker
A web developer can create multiple service workers by using iframes, so assign one KeepaliveContext for each service worker is not an option. Here are some ideas:

#### Have a global KeepaliveContext used by all service workers
This should work, but it will be too restrictive.

#### Use the first service worker client’s KeepaliveContext
This can be leaky in some pathological cases. This will be a bit unintuitive.

#### Do nothing; Let the service worker keep itself alive
[The spec](https://w3c.github.io/ServiceWorker/#service-worker-lifetime) says:

> A user agent may terminate service workers at any time it:
>  - Has no event to handle.
>  - Detects abnormal operation: such as infinite loops and tasks exceeding imposed time limits (if any) while handling the events.

When detecting a abnormal operation, it makes sense to cancel the request even with keepalive set. That means, If we are interested in requests with keepalive set from the page, they should be protected by the service worker.
This option disables keepalive protection in service workers.

I (@yutakahirano) like the last option.

-- 
You are receiving this because you are subscribed to this thread.
Reply to this email directly or view it on GitHub:
https://github.com/whatwg/fetch/issues/679

Received on Tuesday, 6 March 2018 09:28:41 UTC