Re: Futures and transactions from Jonas Sicking on 2013-04-15 (public-webapps@w3.org from April to June 2013)

From: Jonas Sicking <jonas@sicking.cc>
Date: Mon, 15 Apr 2013 03:07:19 +0200
To: Anne van Kesteren <annevk@annevk.nl>
Cc: WebApps WG <public-webapps@w3.org>, Ben Turner <bent@mozilla.com>, Alex Russell <slightlyoff@google.com>, Mounir Lamouri <mounir@lamouri.fr>
Message-ID: <CA+c2ei-UTY95Pq7B58u0-pqRpy3H2V6KF1rnxQeXNXH3QqD1Dg@mail.gmail.com>
The issue isn't actually with Futures and transactions, but rather
between Futures and *locks*.

Any time you have a set of values protected by a lock, reading and
writing those values while holding the lock raises a set of tricky
issues. For example if you allow someone to hold a lock while at the
same time grabbing additional locks, you could end up with deadlocks.

And if someone attempts to do a read-modify-write operation on a value
protected by the lock, but releases the lock for a short period of
time between the read and the write, you can end up with race
conditions.

When we developed IndexedDB, which uses locks a lot, we had the following goals:

A) Make it easy to create pages that are race-free even if opened in
multiple tabs at the same time. Ideally it should be easier to create
a race-free page than a page that has race hazards.
B) Encourage locks to be held for short periods of time. Even in the
case of buggy code which on occasion could throw a condition and thus
fail to release the lock.
C) Make it hard to write pages that have timing issues. I.e. if
someone writes code which do asynchronous actions, the speed of the
database IO shouldn't cause the code to sometimes fail and sometimes
succeed.
D) Make it hard, or even impossible, to create deadlocks. This
includes "asynchronous deadlocks". I.e. where the main event loop of
the app is running normally, but the applications is indefinitely
waiting for an success-callback of an asynchronous lock grabbing
operation.

A is essentially the reason we're using locks at all.

In particular D is hard to solve. As soon as you add an explicit
function to grab a lock and another function for releasing it, you run
the risk of an "asynchronous deadlock". I.e. that some code first
grabs lock X and then requests to grab lock Y, while another page from
the same origin grabs lock Y and then requests to grab lock X. Just
like you can use timeouts and exceptions to deal with normal
deadlocks, you could use timeouts and error callbacks to resolve this
situation. However this is a race condition, it's unlikely that
developers will run into it, and thus unlikely that it will be
properly handled.

The solution that we ended up with in IndexedDB goes something like this:

1. When initially grabbing a lock, there's a well defined limited
period of time during which you can place asynchronous read and write
requests for the values protected by the lock.
2. Once the lock has been grabbed, asynchronous success or error
callbacks are fired for the previously scheduled requests.
3. Additional requests can *only* be scheduled during the success and
error callbacks from previous requests. However they can be placed
both during the callbacks for requests created during step 1, and the
ones created during step 3.
4. Once the last request has finished, and we fired the success or
error callback for it, and no more requests were created during that
callback, the lock is automatically released.

I.e. in step 4 we release the lock once there are no more requests,
and we know that no more requests can be created since they can only
be created during callbacks from other requests.

The reason that step 3 only allows additional requests to be scheduled
during callbacks from other requests is to satisfy goal C above. This
way if you try to place additional requests from an asynchronous
callback it will always fail. Leading to bugs that are easier to
reproduce rather than timing dependent. Likely that even happen during
development and so can be dealt with before code reaches users.

In indexedDB the handling around 1 is quite complicated and not
particularly clean. But for the sake of describing the problem at
hand, you can imagine the well-defined limited period when it's
allowed to place the initial requests against the lock as during some
initial callback which is called when the lock is acquired. It could
even be a callback that's called synchronously when the lock is first
requested since all values are changed asynchronously anyway, and so
could happen once the lock is actually acquired.

In indexedDB we also commit the transaction associated with the lock,
but all of the issues described here happen exactly the same whenever
you have a values protected by a lock.

As was pointed out in this thread, you can reduce these issues by
snapshotting all the values protected by the lock and then immediately
releasing it. Conceptually this is what MVCC does, though it does it
while avoiding some of the performance issues as copying all values
would. However this doesn't actually change any of the problems
discussed above. It just means that you only hit them when needing to
do read/write operations.

In IndexedDB step 3 was fairly easy to implement. It was done by
keeping an internal "open-flag" per lock. This flag is normally set to
false. When we are about to fire a success or error callback we first
set the flag to true, then immediately fire the callback, then set the
flag back to false. So the only code that executes while the flag is
true is the callback.

Whenever a request is placed, we check if the flag is true, and if so
flag an error.

Incidentally, WebSQL uses the exact same solution to this problem. The
only difference is in details around the syntax of firing callbacks.
But WebSQL uses the same steps 1-4.

The problems when trying to use the above solution when creating an
API which uses locks and is based on Futures is the handling of the
open-flag.

Since Futures normally call their callbacks asynchronously, we can't
set the open-flag to true right before the call, and set it back right
after the call. This means that we have a harder time to solve problem
3 described above.

The spec currently has an internal "synchronous flag" which lets an
implementation of the Future API work around this problem. I.e. as
long as the implementation of the API also is the implementation of
the Future API then the implementation can use this flag to set an
open-flag to true, then call any callbacks registered using .then()
and .done(), and then set the open-flag back to false.

So I guess the current solution is fine as longs as either
* No JS libraries will want to implement APIs that uses locks, or
* Such libraries are ok with not using the built-in Future API and
instead re-implementing the Future API themselves.

The reason this isn't normally a problem for Futures is that they are
normally designed for a world where any state or guarantees that the
callbacks depend on are passed in the result of the Future, rather
than having to be set on the surrounding world. This is certainly a
better design, but isn't always possible to use. I.e. we can't pass
anything in the result of the Future which allows the callback to
reopen the lock and change the values protected by it. Once the lock
has been released, the values are no longer protected and can be
changed by other pieces of code, or by other pages.

I wouldn't be surprised if we'll run into similar situations in the
future. I.e. other situations where we can't simply pass more context
through the Future result and instead have to rely on surrounding
world state. But I'm fine with crossing that bridge once we get there,
if we get there. And we always have the escape hatch of letting pages
implement the Future API themselves.

/ Jonas

On Mon, Apr 8, 2013 at 1:05 PM, Anne van Kesteren <annevk@annevk.nl> wrote:
> Lets discuss this here. As I understand it the semantics of Indexed DB
> are such that in the following example the s.set() in the callback
> from the timeout fails because the active flag is false:
>
> transact(function(s) {
>   s.get("x").done(function(v) { s.set("x", v + 1) }
>   setTimeout(function() { s.set("x", 0)  }, 0)
> }
>
> If that would not fail we would allow for race conditions.
>
> I think we should simply do the same with futures. There's a
> transaction future. When it's init function is invoked (the function
> passed to transact() above) the transaction is active and futures can
> be added to it (the s.get() call). Those futures are special too and
> will have the transaction in scope when their callbacks are called
> (though only if the transaction is not closed yet). And after those
> futures their callbacks are called the transaction will be notified
> that callbacks have been invoked. At that point new futures could have
> been added to be in scope or all could be okay in the world.
>
> Now I think Jonas' problem with this model was that the futures would
> be special and not general purpose. I think it's unavoidable that in
> some cases futures need to be subclassed. What's important is that
> futures themselves don't break the general contract. Themselves they
> keep representing the return value of an operation.
>
>
> --
> http://annevankesteren.nl/
Received on Monday, 15 April 2013 01:08:17 UTC