Re: [IndexedDB] Atomic schema changes

On Thu, Jun 24, 2010 at 1:48 AM, Jonas Sicking <jonas@sicking.cc> wrote:

> Hi All,
>
> In bug 9975 comment 1 [1] Nikunj pointed out that it is unclear how to
> make atomic changes to the schema of a database. For example adding an
> objectStore and a couple of indexes.
>
> While it actually currently is possible, it is quite quirky and so I
> think we need to find a better solution.
>
> One way this is already possible is by calling setVersion. When the
> success event fires for this request, it contains an implicitly
> created transaction which, while it is alive, holds a lock on the
> whole database an prevents any other interactions with the database.
>
> However setVersion is a fairly heavy operation. We have discussed a
> couple of different ways it can work, but it seems like there is
> agreement that any other open database connections will either have to
> be close manually (by for example the user leaving the page), or they
> will be close forcefully (by making any requests on them fail). I
> intend on sending a starting a separate thread on defining the details
> of setVersion.
>
> We might want to allow making smaller schema changes, such as adding a
> new objectStore or a new index, without requiring all other database
> connections to be closed. Further, it would be nice if "atomicness"
> was a default behavior as to avoid people accidentally creating race
> conditions.
>
> We've talked a bit about this at mozilla and have three alternative
> proposals. In all three proposals we suggest moving the
> createObjectStore to the Transaction interface (or possibly a new
> SchemaTransaction interface). The createIndex function remains on
> objectStore, but is defined to throw an exception if called outside a
> transaction which allows schema changes.
>
> Proposal A:
> Always require calls to setVersion for changes to the database schema.
> The success event fired on the setVersion request is a
> IDBTransactionEvent. The author can use the createObjectStore method
> on the transaction available on the event to create new object stores.
>
> Additionally, since we know that no one else currently has an open
> database connection, we can make creating objectStores synchronous.
> The implementation can probably still asynchronously fail to create an
> objectStore, due to diskspace or other hardware issues. This failure
> will likely only be detected asynchronously, but can be raised as a
> failure to commit the transaction as it is extremely rare.
>
> The code would look something like:
>
> if (db.version == "2.0") {
>  weAreDoneFunction();
> }
> db.setVersion("2.0").onsuccess = function(event) {
>  trans = event.transaction;
>  store1 = trans.createObjectStore("myStore1", ...);
>  store2 = trans.createObjectStore("myStore2", ...);
>  store1.createIndex(...);
>  store1.createIndex(...);
>  store2.createIndex(...);
>  trans.oncomplete = weAreDoneFunction;
> }
>
> Proposal B:
> Add a new type of transaction SCHEMA_CHANGE, in addition to READ and
> READ_WRITE. This transaction is required for any schema changes. As
> long as the transaction is open, no other schema changes can be done.
> The transaction is opened asynchronously using a new
> 'startSchemaTransaction' function. This ensures that no other
> modifications are attempted at the same time.
>
> Additionally, since we know that no one else currently is inside a
> SCHEMA_CHANGE transaction we can make creating objectStores
> synchronous. The implementation can probably still asynchronously fail
> to create an objectStore, due to diskspace or other hardware issues.
> This failure will likely only be detected asynchronously, but can be
> raised as a failure to commit the transaction as it is extremely rare.
>
> Code example:
>
> if (db.objectStoreNames.contains("myStore1")) {
>  weAreDoneFunction();
>  return;
> }
> db.startSchemaTransaction().onsuccess = function(event) {
>  // Have to check again as the objectStore could have been created
> before the callback fired
>  if (db.objectStoreNames.contains("myStore1")) {
>    weAreDoneFunction();
>    return;
>  }
>  trans = event.transaction;
>  store1 = trans.createObjectStore("myStore1", ...);
>  store2 = trans.createObjectStore("myStore2", ...);
>  store1.createIndex(...);
>  store1.createIndex(...);
>  store2.createIndex(...);
>  trans.oncomplete = weAreDoneFunction;
> }
>
>
> Proposal C:
> This is like proposal B, however the schema change transaction is
> started synchronously, same as other transactions (we could even reuse
> the existing transaction() function, however we'd have to ignore the
> storeNames argument).
>
> Since two different pages, running in different processes, could now
> start a schema-change transaction at the same time, and thus could
> call createObjectStore at the same time and attempt to create the same
> store, we have to keep createObjectStore asynchronous.
>
> Code example:
>
> if (db.objectStoreNames.contains("myStore1")) {
>  weAreDoneFunction();
>  return;
> }
> trans = db.startSchemaTransaction();
> trans.createObjectStore("myStore1", ...);
> trans.createObjectStore("myStore2", ...).onsuccess = function(event) {
>  store1 = trans.objectStore("myStore1");
>  store2 = trans.objectStore("myStore2");
>  store2 = trans.createObjectStore("myStore2", ...);
>  store1.createIndex(...);
>  store1.createIndex(...);
>  store2.createIndex(...);
>  trans.oncomplete = weAreDoneFunction;
> }
> trans.onerror = function(event) {
>  if (event.code == ERROR_OBJECT_STORE_ALREADY_EXISTS) {
>    weAreDoneFunction();
>  }
> }
>
>
> Note that in the last example, the onerror case is needed to handle
> the condition that two pages attempt to create the same objectStore at
> the same time. Because of this I'm really not a fan of Proposal C as
> we're effectively relying on the page to handle a race condition.
>
> Between the remaining two I think A is clearly simpler for the
> developer. Yes, it does mean that any schema changes will require
> closing all other tabs, but I'm not convinced this is a big deal.
>

To be clear, you mean closing the connections other tabs have, right?

I've thought a lot about this (since we first started talking about
setVersion) and I'm no longer convinced that a non-heavy-weight schema
changing mechanism is important for v1.  I also have some various other
thoughts:

I now worry that if we make it easy (or as easy) to do an "advanced" schema
change, we'll see a lot of apps breaking in weird ways and potentially
corrupting their own data.  (For example, v1 of the website might use a data
structure in a way that's incompatible with v2.  When the site changes, v1
continues using the data structure in the bad way thus corrupting data.)
 Intentionally breaking pages (until refresh) that aren't designed to handle
schema changes actually seems like a good thing.

But, I feel pretty strongly that a setVersion/schema change transaction
should not simply kill off anything else currently running.  The reason is
that it's not hard for apps to recover from a connection failing, but it is
hard to handle a transaction failing in an unpredictable way.  Especially
static transactions (which should rarely fail committing since serialization
can be guaranteed before the transaction starts).

I think options B and C add a lot of API surface area and implementational
complexity in return for a small improvement for power users but much more
confusion for normal users.  I think A is by far the simplest and is
adequate for v1.  We can then look at making something more complex for
v2--once we have some real-world experience.

I'm OK with making createObjectStore/createIndex synchronous.  It would
definitely make such code cleaner and I don't see a major downside, but at
the same time I feel like this API is starting to get kind of ad-hoc and
unintuitive to a new user.  Having the create and the remove functions
completely different and in different places seems weird.

So I agree with either A or leaving things as originally proposed by Jonas
in "[IndexDB] Proposal for async API changes".


> I know people have talked about creating temporary objectStores which
> are only needed for temporary computations. If that really is needed I
> suspect that the existing createObjectStore function is the wrong
> solution anyway as authors will have to deal with avoiding name
> collisions between temporary stores. We might want to consider adding
> a synchronous transaction.createTempObjectStore function which would
> always be available and which would create a nameless objectStore
> which is automatically deleted once a transaction ends. This might
> reduce the need for schema changes without version changes.
>
> / Jonas
>
> [1] http://www.w3.org/Bugs/Public/show_bug.cgi?id=9975#c1


If we talk about temporary object stores, I agree that it should be done
with its own feature/method and thus it should be discussed in a forked
thread (not this one).

J

Received on Thursday, 24 June 2010 11:33:42 UTC