Re: IndexedDB, what were the issues? How do we stop it from happening again? from Jonas Sicking on 2013-03-16 (www-tag@w3.org from March 2013)

From: Jonas Sicking <jonas@sicking.cc>
Date: Sat, 16 Mar 2013 04:03:17 -0700
To: Alex Russell <slightlyoff@google.com>
Cc: Miko Nieminen <miko.nieminen@iki.fi>, Marcos Caceres <marcosscaceres@gmail.com>, Jeni Tennison <jeni@jenitennison.com>, Shwetank Dixit <shwetankd@opera.com>, "www-tag@w3.org" <www-tag@w3.org>, Webapps WG <public-webapps@w3.org>, Joshua Bell <jsbell@google.com>, Alec Flett <alecflett@google.com>, Jonas Sicking <sicking@mozilla.com>, mlamouri@mozilla.com, Tab Atkins <tabatkins@google.com>, Yehuda Katz <wycats@gmail.com>, Andrei Popescu <andreip@google.com>
Message-ID: <CA+c2ei8ixi+4Jt8E-7jmCiP4egUFsNCscGjptE48+MU_upiiFA@mail.gmail.com>
On Wed, Mar 6, 2013 at 6:01 AM, Alex Russell <slightlyoff@google.com> wrote:
> I've avoided weighing in on this thread until I had more IDB experience.
> I've been wrestling with it on two fronts of late:
>
> A re-interpretation of the API based on Futures:
> https://github.com/slightlyoff/DOMFuture/tree/master/reworked_APIs/IndexedDB
> A new async LocalStorage design + p(r)olyfill that's bootstrapped on IDB:
> https://github.com/slightlyoff/async-local-storage
>
> While you might be right that it's unlikely that the API can be
> "simplified", I think it's trivial to extend it in ways that make it easier
> to reason about and use.
>
> This thread started out with a discussion of what might be done to keep
> IDB's perceived mistakes from reoccurring. Here's a quick stab at both an
> outline of the mistakes and what can be done to avoid them:
>
> Abuse of events
> The current IDB design models one-time operations using events. This can
> make sense insofar as events can occur zero or more times in the future, but
> it's not a natural fit. What does it mean for oncomplete to happen more than
> once? Is that an error? Are onsuccess and onerror exclusive? Can they both
> be dispatched for an operation? The API isn't clear. Events don't lead to
> good design here as they don't encapsulate these concerns. Similarly, event
> handlers don't chain. This is natural, as they could be invoked multiple
> times (conceptually), but it's not a good fit for data access. It's great
> that IDB as async, and events are the existing DOM model for this, but IDB's
> IDBRequest object is calling out for a different kind of abstraction. I'll
> submit Futures for the job, but others might work (explicit callback,
> whatever) so long as they maintain chainability + async.

Whether it's an "abuse" of events or not I guess is a matter of opinion.

DOM Events have always been used in situations when the Event fired
either 0 or 1 time. They've even been used in situations when an
eventual success/error has been signaled. In particular the "load" and
"error" events for Documents were among some of the first Events
created.

That said, I agree that Events are generally a better fit for
situations when you have a reoccurring "thing" that happen.

And yes, if we had had Futures when we designed IDB it might have lead
to a much easier to use API. We had a hunch that that was the case
when we designed the API. However there weren't then, and there still
aren't, a standardized promise API.

We felt then, and I still feel that way now, that it would have been a
mistake to standardize a promise library as part of a database API.
This is why I've been pushing for someone to step up and take on
creating a standardized promise library that we can rely on for future
APIs.

https://twitter.com/SickingJ/status/202152629743255552

> Implicitness
> IDB is implicit in a number of places that cause confusion for folks not
> intimately familiar with the contract(s) that IDB expects you to enter into.
> First, the use of events for delivery of notifications means that
> sequential-looking code that you might expect to have timing issues doesn't.
> Why not? Because IDB operates in some vaguely async way; you can't reason at
> all about events that have occurred in the past (they're not values, they're
> points in time).

You seem to be under the impression that Events are only used/intended
for situations when something will happen "zero or more times at some
point in the future". I agree that this is the scenario when Events
really shine. Especially when that something is connected to DOM
Nodes.

However the way they are actually used in the DOM platform is as a
generic way of doing callbacks.

So despite the fact that IDB uses Events, it still has quite strict
requirements in which order they fire. Having a strict order of
delivering results was a quite intentional design decision.

> I can't find anywhere in the spec that the explicit
> gaurantees about delivery timing are noted
> (http://www.w3.org/TR/IndexedDB/#async-api),

https://dvcs.w3.org/hg/IndexedDB/raw-file/tip/Overview.html#steps-for-asynchronously-executing-a-request

See in particular step 4 which guarantees that requests are run and
deliver their result in the order they were scheduled.

> so one could read IDB code that
> registers two callbacks as having a temporal dead-zone: a space in code
> where something might have happened but which your code might not have a
> chance to hear about. I realize that in practice this isn't the case; event
> delivery for these is asynchronous, but the soonest timing isn't defined:
> end of turn? next turn? end-of-microtask? This means that it's possible to
> have implementations the differ on delivery timing, astonishing those who
> register event handlers at the wrong time. This is part DOM-ish use of
> events for things they're not suited to and a lack of specificity in the
> spec. Both can be fixed.

I agree that this is something that is vague right now. The intent is
that all results are delivered as new tasks. I.e. they are delivered
as separate turns. So it's definitely after end-of-turn and after
end-of-microtask. Whether it's next turn or some turn after that
depends on how fast the request finishes.

> A related bit of implicitness is the transaction object. Auto-open and
> auto-close might be virtues, but they come with costs. When does a
> transaction auto-close?

https://dvcs.w3.org/hg/IndexedDB/raw-file/tip/Overview.html#dfn-transaction-lifetime

See in particular step 7.

> It's not clear from the spec; 4.2 says that a
> transaction must be inactive when control returns to the event loop, but
> gives no indication of what the nearest timing for that is. It's also not
> clear how to keep a transaction "alive" across turns (a basic need), create
> sub-transactions (a key feature of many transaction-oriented DBs), and
> detect that a transaction object is in something other than the "active"
> state. The last bit is particularly galling: you can have a handle to the
> object, but users can't ask for state they might want, despite the spec
> spending a great deal of time telling implementers that they must do this
> and that with this bit. If there's a principle at issue, it's the idea that
> specs -- particularly low-level APIs -- should not reserve to themselves
> state and information that they need but for which they don't immediately
> spot a user need.

I agree that we probably should expose the transaction state. I think
to an extent it was simply overlooked.

However my experience is that if you need to look at the transaction
state, you likely have code that will have race conditions if your
page is open in multiple tabs. This is a problem that your library at
[1] suffers from for example.

> Confused collection interfaces
> IDB has a factory for databases and object stores and allows retrieval of
> them by name (asynchronously, which is good)...but doesn't provide a
> coherent Map interface onto them. By being DOM-ish and not JS-ish, IDB once
> again creates oddball JS objects that could pun with built-ins and therefore
> ease the learning curve, but doesn't. No, these aren't (synchronous) maps,
> but punning the API with ES6's Map type would go a long way.

I don't think Map existed at the time when we wrote the IDB spec. But
yes, we would probably be able to use it now.

> Doubled API surface for sync version
> I assume I just don't understand why this choice was made, but the explosion
> of API surface area combined with the conditional availability of this
> version of the API make it an odd beast (to be charitable).

The sync API is intended for use in Workers. The Sync API is
dramatically easier to use and one of the goals with Workers is to
allow people to write sequential code that does IO.

Yes, promises makes async code easier to work with. And things like
task.js makes it easier still. However I don't think people will ever
say that something like [1] will be as easy to use as localStorage is,
and the reason for that is that localStorage is sync.

> The idea that this is all going to be wrapped up by libraries anyway

This was certainly not what I had in mind when designing the API.

However note that we set out to create a fully featured database API.
I suspect we always needed a simple API which had the absolute minimal
amount of syntax needed to simply store a few key-value pairs, as well
as a full-featured API which allowed the creation of applications like
searchable mail or calendar databases.

I was certainly always of the mind that doing the full-featured API
first was better since that solved both cases, though one of them
required a library.

If we'd done the simple API first, we probably would be nowhere near
as done with the full-featured API as we are now.

Another important point to bring up was that we gave ourselves a few
pretty hard-to-solve requirements. In particular:

* Make it easy to create pages that are race-free even if opened in
multiple tabs at the same time. Ideally it should be easier to create
a race-free page than a page that has race hazards.
* Encourage transactions to stay open for short periods of time. Even
in the case of buggy code which on occasion could throw a condition
and thus fail to call a .commit() function.
* Make it hard to write pages that have timing issues. I.e. if someone
writes code which do asynchronous actions, the speed of the database
IO shouldn't cause the code to sometimes fail and sometimes succeed.

I actually think we succeeded tremendously well in fulfilling these
requirements. I don't know how I'd make the API easier to use while
still keeping those design requirements.

Right now I don't think it's possible to solve those requirements
while basing the API on Futures for example.

[1] https://github.com/slightlyoff/async-local-storage

/ Jonas
Received on Saturday, 16 March 2013 11:04:15 UTC