Re: [whatwg/storage] Rethinking storage proxy map (#96) from Andrew Sutherland on 2020-06-08 (public-webapps-github@w3.org from June 2020)

From: Andrew Sutherland <notifications@github.com>
Date: Sun, 07 Jun 2020 19:45:18 -0700
To: whatwg/storage <storage@noreply.github.com>
Cc: Subscribed <subscribed@noreply.github.com>
Message-ID: <whatwg/storage/issues/96/640330721@github.com>

The LocalStorage case being dealt with in https://github.com/whatwg/html/pull/5560 isn't synchronously dealing with the authoritative map, it's dealing with a replicated copy of the map, but that's largely hand-waved away via the "multiprocess" disclaimer. Perhaps the hand-waving should be reduced and that will help clear up the error handling[1]?

I think the inescapable implementation reality is that there are always going to be at least 3 event loops involved for any storage endpoint and it could be worth specifying this:
1. The event loop hosting the authoritative storage bottle map for the endpoint for the given bucket. (Which may be different than the event loop for buckets on the same shelf or on different shelves, etc.)
2. One or more event loops processing I/O operations for the storage bottle map. (Or put another way, for performance reasons, implementations will not/cannot be required to serialize storage API decisions based in a blocking manner on disk I/O.)
3. The event loop for the agent where the API calls are happening.
4. (There might also be separate event loops for the authoritative storage bucket map and higher levels, but those don't matter for bottle map errors unless they are fatal.)

Although there will always be policy checks that can happen in the agent event loop that are synchronous, the reality is that most unexpected failures will happen in the I/O event loops and these will then want to notify the authoritative storage bottle map.

Especially given that there's interest in the Storage Corruption Reporting use-case ([explainer](https://github.com/wanderview/storage-corruption-reporting/blob/master/explainer.md) [issue in this repo](https://github.com/whatwg/storage/issues/75), this async processing would make sense as any corruption handlers would want to be involved in the middle of the process.

One might create the following mechanisms:
- **report a broken bottle**: Used by endpoints to report something is wrong with the endpoint's storage bottle.
- **process a broken bottle report**: On the authoritative bucket event loop, consult the bucket metadata which determines what action to take. In the future this would allow for a storage bucket corruption handler to get involved. For now the decision would always be to "wipe". In the future this action would then be handed off to the corruption reporting mechanism, however that would work.
- **perform a bottle inventory**: Future work: Exposed by endpoints so that corruption handlers could get an idea of the damage. This might take the form of returning an object with sets of map keys corresponding to: known fully retained map entries, known partially retained map entries, known fully lost map entries. It would also have a boolean that indicates if there are map entries for which the keys were lost. I suppose there could also be a set for map entries where the name was lost but some/all of the data was retained and a synthetic name was created.

For all Storage endpoints, the question whenever any error occurs on the I/O loop or when ingesting data provided by the I/O loop is: **Does this break the bottle?**. For the "indexedDB", "caches", and "serviceWorkerRegistrations" endpoints there are already in-band API means of relaying I/O failures (fire an UnknownError or more specific error, reject the promise, reject the promise) and there's no need to break the bottle. For "localStorage" and "sessionStorage" there's no good in-band way to signal the problem, but any transient inability to persist changes to disk can be mitigated by buffering and when the transient inability becomes permanent, the bottle can be said to be broken.

1: From a spec perspective (ignoring optimizations), Firefox's LocalStorage NextGen overhaul can be said to synchronously queue a task to make a snapshot of the authoritative bottle map on the authoritative bottle map event loop the first time the LocalStorage API is used in a given task on the agent event loop. The snapshot is retained until the task and its micro-task checkpoint completes, at which point any changes made are sent to the authoritative bottle map in a task where they are applied. This maintains run-to-completion consistency (but does not provide magical global consistency). There are other possible implementations like "snapshot at first use and broadcast changes" which could also be posed in terms of the event loops/task sources.

--
You are receiving this because you are subscribed to this thread.
Reply to this email directly or view it on GitHub:
https://github.com/whatwg/storage/issues/96#issuecomment-640330721

Received on Monday, 8 June 2020 02:45:31 UTC