[w3c/ServiceWorker] [Feature request] Allow keeping service worker alive (#1558) from Ashley (Scirra) on 2020-12-17 (public-webapps-github@w3.org from December 2020)

From: Ashley (Scirra) <notifications@github.com>
Date: Thu, 17 Dec 2020 05:57:03 -0800
To: w3c/ServiceWorker <ServiceWorker@noreply.github.com>
Cc: Subscribed <subscribed@noreply.github.com>
Message-ID: <w3c/ServiceWorker/issues/1558@github.com>

We have a use case for Service Workers that sounds simple but turns out to be surprisingly complicated: in essence we have a map of URL to Blob that we want the Service Worker to serve in fetch events. (As it happens I first mentioned this in #1556.) This is to allow various features of local preview in our browser-based game development software [Construct 3](https://www.construct.net/).

My first (and it turns out, naïve) approach was to have a client post the map of URL to Blob to the Service Worker. The SW stores this in a global variable, and then does lookups in the fetch event and serves the corresponding blob if there's a match (else falls back to network etc). But it turns out the browser is allowed to terminate the SW after a period of inactivity - in which case the map in the global variable is wiped, and fetches to those URLs subsequently fail (going to network and returning 404). Oops! It turns out in Chrome this is 30 seconds, and we only realised after we shipped all this and a user eventually figured out that their network requests started failing after 30 seconds, and filed a bug with us (https://github.com/Scirra/Construct-3-bugs/issues/4422).

The problem is, how else can we implement this?

**Idea 1: save the map to storage.** We could write the map to IndexedDB. This won't work though, for several reasons:

- We could be serving a large amount of local content, corresponding to an entire game. The user may not have enough storage quota to save the whole map.
- In some browsers with private browsing, or certain privacy settings, storage APIs throw on any attempt to use them and so cannot be used at all in these modes.
- An error could occur while writing to storage. We know through our work with the editor that storage isn't reliable at scale - we see about 50 storage failures a day in our telemetry, with a mix of AbortError, DataError, UnknownError, QuotaExceededError, TimeoutError...
- Writing all this data to storage could be slow, degrading the user experience.

Basically we can't rely on storage, so let's eliminate that. The map will have to stay in memory.

**Idea 2: store the map on the client.** The client can keep its own map of URL to blob. Then in the SW fetch event, it can post to the client asking it if it has something to serve for that URL, and if it does the client can post back the blob to serve. (This seems weirdly circuitous, but it keeps the map on a client which will be kept alive.)

The problem here is, how do you know the client has loaded enough to answer a message? If it hasn't loaded enough to attach an event listener, the SW will simply not get a response. I guess it could use a timeout, but this is a performance critical network path. What if it needs to make several requests to load the client before it can start listening for messages? All those network requests will be delayed by the timeout. And to improve the guarantee that the client can respond in time, even if it's busy, you really want as long a timeout as possible. There doesn't seem to be a good trade-off here.

**Idea 3: keep the service worker alive.** As long as the service worker stays alive, its map will be kept in memory. It turns out we can make the client keep the SW alive with a little postMessage dance: the client posts "keepalive" to the SW; the SW calls `waitUntil()` on the message event with a few seconds of timeout; and then posts back to the client, which then immediately sends back a "keepalive" message... This ensures the SW is basically permanently waiting on a `waitUntil()` promise. As soon as the client is closed, it stops sending "keepalive" messages, and so the SW will fall back to idle and so can be terminated (i.e. this doesn't keep the SW alive forever, only the duration of the client).

My concern with this is firstly it feels like a hack, and secondly it appears the ability to keep a SW alive seems to have been considered a bug in some cases and steps taken to mitigate it.

So my suggestion is: why not add a way in the spec to indicate that the SW ought to be kept alive? It can be limited to a single client, or all active clients, to avoid keeping it alive permanently after its clients are closed. Then we have a clean way to get this behavior without having to resort to what feel like postMessage hacks, and we can rest assured browser makers won't regard this as some kind of bug and add mitigations, which will subsequently break this case again. It also already seems to be possible to keep a SW alive, so long as it has work to do (e.g. regular fetches that go through the SW fetch event), so in that sense this wouldn't actually be adding a new capability, only making it more convenient to do so.

It looks like there are other cases where it looks like keeping the SW alive would be useful for streaming: #882, #1182

Or maybe there is some other approach that could cover this? Something like import maps but for all fetches would solve our use case too, but those currently only cover JS Modules (and are still in development). But it looks like there are several use cases for keeping a SW alive.

--
You are receiving this because you are subscribed to this thread.
Reply to this email directly or view it on GitHub:
https://github.com/w3c/ServiceWorker/issues/1558

Received on Thursday, 17 December 2020 13:57:16 UTC