Re: [w3c/ServiceWorker] Editorial: Queue a task to resolve/reject promise or when fire an event. (PR #1755)

domenic left a comment (w3c/ServiceWorker#1755)

This diff looks locally reasonable to me. However, I am not caught up on the discussions about parallel queues vs. using the generic "in parallel", so I cannot say which is is best here. Maybe merging this first is good and then doing some overall work on using parallel queues later is the right approach.

---

To try to answer the more generic questions:

Ideally, specifications are written to match implementations:

- There are completely separate threads, which mostly do not touch.
  - There is one event loop thread per agent
  - There is a thread for each specific parallel queue
  - Every invocation of the generic "in parallel" can be thought of as spawning a new separate thread
- In particular, it's important to be very clear what objects and data structures are touchable from each spec thread
  - Objects with a strong connection to JavaScript, like promises or Web IDL interface objects, must only be touched from their corresponding event loop thread.
    - But see below for how we don't always perfectly follow this advice.
  - Simple primitive values like booleans or strings, or more complex "primitive-ish" values like sets of integers or Web IDL dictionaries/Infra maps whose values are only primitives, can be touched from any thread. However, you need to take care to avoid race conditions.
    - We usually pass these objects across threads implicitly, with no clear indication that we're doing so. E.g., by having "in parallel" steps close over a variable that was introduced in the event loop portion of an algorithm.
    - This is often fine because 80% of the time, these values just make a one-way trip from the event loop (where they came from the web developer as method arguments or similar) to the in parallel thread (where they get processed).
    - Sometimes we need to do something more complex. Then it's best to add informative notes that are explicit about how you're avoiding race conditions. E.g. see [this step in a spec I recently wrote](https://webmachinelearning.github.io/writing-assistance-apis/#create-an-ai-model-object:~:text=Let%20abortedDuringDownload%20be%20false.,but%20read%20from%20in%20parallel.).
  - Some objects represent "browser process" data structures, and as such you want to ensure you only touch them from "in parallel", or from a specific parallel queue.
    - Usually a specific parallel queue is better, but probably there are cases where it's unnecessary.
    - For example, the user agent's list of all shared worker global scopes is only accessed from the [shared worker manager](https://html.spec.whatwg.org/#shared-worker-manager) parallel queue. (But see below on `SharedWorkerGlobalScope`.)
- Sometimes there is an unfortunate mixing up between JavaScript objects and their "browser process" counterparts, which causes specs to end up touching those JS objects across multiple threads, even though this violates the above rules.
  - For example, there is a lot of information stored on the `Document` object in specifications, like URL or origin, which needs to be accessed from tons of different threads.
  - Or, the example linked above about shared worker manager is specified to look at the "list of all `SharedWorkerGlobalScope` objects", which violates the above rules, because `SharedWorkerGlobalScope` is a JavaScript object that should only exist on the event loop.
  - A fully rigorous spec for this would match implementations better. For example, it would have separate spec objects for `Document` and "browser-process document-related info". It would then need to specify the exact way in which this information is synchronized across the two representations. However, this is a lot of work, for unclear benefit.
  - An example of a spec which does this more correctly is Fetch, which separates [requests](https://fetch.spec.whatwg.org/#concept-request) from [`Request`s](https://fetch.spec.whatwg.org/#request-class).
    - However, the synchronization is kind of magical there, instead of explicit, so it's possible there are spec bugs or mismatches with the implementation, because the "requests" are accessed from multiple threads without explicit transfer algorithms, or explicit guards against racy multiple-writes.
    - But this is probably our best model today of how to separate things, while not making the spec excessively complicated.
    - The service worker spec seems to mostly follow this model.

---

How does this apply to the service worker spec, and to this PR?

Well, I think the main thing is to identify from which thread you intend "service worker registration" objects to be accessed. I get the impression it should be in parallel, not on an event loop thread. So that's why it's worth keeping something parallel-ish in `getRegistration()`, whether it's "in parallel" or a specific parallel queue. And then, you need to queue a task to resolve the promise. So that's why this PR looks reasonable to me.

Note that in this PR there is some thread crossing, where _registration_ is implicitly passed from "in parallel" (where it was created in step 8.1) to the event loop thread (in step 8.2.1 and 8.2.2). This is a bit sketchy, but mostly OK. In practice, what's probably happening here is a copy and snapshot: when "get the service worker registration object" consults _registration_'s installing worker in step 2.6, it's probably not doing some sort of cross-thread lookup at the actual _registration_ object, but instead looking at some key that was passed across threads in `getRegistration()` step 8.2.

A more fully rigorous version of this whole setup would have explicitly-obvious primitive keys (like string UUIDs, or something) that allow us to correspond between browser-process "service worker"/"service worker registration" objects and event loop `ServiceWorker`/`ServiceWorkerRegistration` objects. Then algorithms like `getRegistration()` would pass across those keys in a way that makes it clear we're taking a snapshot. Then back on the event loop thread we'd look up the right `ServiceWorker`/`ServiceWorkerRegistration` objects and use those to construct the return value we resolve the promise with. But that's a lot of work, for unclear gain. The current model of just using the identity of the "service worker"/"service worker registration" objects, instead of explicit primitive keys, and understanding that it's implicitly snapshotted when crossing thread boundaries, is reasonable enough.

If I were to suggest the most useful improvements to improving the spec's cross-thread hygiene, here would be my priority list:

1. PRs like this, which clarify which task source is used to resolve promises.
2. Notate every algorithm in Appendix A with whether they run "in parallel" or on the event loop. This might uncover some problems. (I've been doing this recently, e.g. [here](https://webmachinelearning.github.io/writing-assistance-apis/#supporting-creation:~:text=Assert:%20these%20steps%20are%20running%20on%20realm%E2%80%99s%20surrounding%20agent%E2%80%99s%20event%20loop.) and [here](https://webmachinelearning.github.io/writing-assistance-apis/#initialize-the-summarization-model:~:text=an%20AISummarizerCreateOptions%20options:-,Assert:%20these%20steps%20are%20running%20in%20parallel.,-Perform%20any%20necessary).)
3. Audit all "in parallel" algorithms to ensure they aren't touching JS objects.
4. Use parallel queues appropriately to avoid race conditions from separate "in parallel" situations.
5. Figure out the relationship between parallel queues and the spec's existing idea of "jobs". (Maybe @jakearchibald can comment on this, as I think he might have invented both concepts.) Potentially rebase "jobs" on parallel queues?
6. Consider some mechanism for being more clear about how and when things cross threads. (This is generally not a solved problem in the spec ecosystem. Above I discussed one method, of using explicitly primitive keys, but maybe there is something more lightweight.)

I think it is totally fine if only item 1 on this list gets done. Items like 2-4 can just be things you keep in mind when you write new spec text.

-- 
Reply to this email directly or view it on GitHub:
https://github.com/w3c/ServiceWorker/pull/1755#issuecomment-2661941708
You are receiving this because you are subscribed to this thread.

Message ID: <w3c/ServiceWorker/pull/1755/c2661941708@github.com>

Received on Monday, 17 February 2025 04:18:38 UTC