Re: [IndexedDB] Atomic schema changes

On Thu, Jun 24, 2010 at 4:32 AM, Jeremy Orlow <jorlow@chromium.org> wrote:
> On Thu, Jun 24, 2010 at 1:48 AM, Jonas Sicking <jonas@sicking.cc> wrote:
>>
>> Hi All,
>>
>> In bug 9975 comment 1 [1] Nikunj pointed out that it is unclear how to
>> make atomic changes to the schema of a database. For example adding an
>> objectStore and a couple of indexes.
>>
>> While it actually currently is possible, it is quite quirky and so I
>> think we need to find a better solution.
>>
>> One way this is already possible is by calling setVersion. When the
>> success event fires for this request, it contains an implicitly
>> created transaction which, while it is alive, holds a lock on the
>> whole database an prevents any other interactions with the database.
>>
>> However setVersion is a fairly heavy operation. We have discussed a
>> couple of different ways it can work, but it seems like there is
>> agreement that any other open database connections will either have to
>> be close manually (by for example the user leaving the page), or they
>> will be close forcefully (by making any requests on them fail). I
>> intend on sending a starting a separate thread on defining the details
>> of setVersion.
>>
>> We might want to allow making smaller schema changes, such as adding a
>> new objectStore or a new index, without requiring all other database
>> connections to be closed. Further, it would be nice if "atomicness"
>> was a default behavior as to avoid people accidentally creating race
>> conditions.
>>
>> We've talked a bit about this at mozilla and have three alternative
>> proposals. In all three proposals we suggest moving the
>> createObjectStore to the Transaction interface (or possibly a new
>> SchemaTransaction interface). The createIndex function remains on
>> objectStore, but is defined to throw an exception if called outside a
>> transaction which allows schema changes.
>>
>> Proposal A:
>> Always require calls to setVersion for changes to the database schema.
>> The success event fired on the setVersion request is a
>> IDBTransactionEvent. The author can use the createObjectStore method
>> on the transaction available on the event to create new object stores.
>>
>> Additionally, since we know that no one else currently has an open
>> database connection, we can make creating objectStores synchronous.
>> The implementation can probably still asynchronously fail to create an
>> objectStore, due to diskspace or other hardware issues. This failure
>> will likely only be detected asynchronously, but can be raised as a
>> failure to commit the transaction as it is extremely rare.
>>
>> The code would look something like:
>>
>> if (db.version == "2.0") {
>>  weAreDoneFunction();
>> }
>> db.setVersion("2.0").onsuccess = function(event) {
>>  trans = event.transaction;
>>  store1 = trans.createObjectStore("myStore1", ...);
>>  store2 = trans.createObjectStore("myStore2", ...);
>>  store1.createIndex(...);
>>  store1.createIndex(...);
>>  store2.createIndex(...);
>>  trans.oncomplete = weAreDoneFunction;
>> }
>>
>> Proposal B:
>> Add a new type of transaction SCHEMA_CHANGE, in addition to READ and
>> READ_WRITE. This transaction is required for any schema changes. As
>> long as the transaction is open, no other schema changes can be done.
>> The transaction is opened asynchronously using a new
>> 'startSchemaTransaction' function. This ensures that no other
>> modifications are attempted at the same time.
>>
>> Additionally, since we know that no one else currently is inside a
>> SCHEMA_CHANGE transaction we can make creating objectStores
>> synchronous. The implementation can probably still asynchronously fail
>> to create an objectStore, due to diskspace or other hardware issues.
>> This failure will likely only be detected asynchronously, but can be
>> raised as a failure to commit the transaction as it is extremely rare.
>>
>> Code example:
>>
>> if (db.objectStoreNames.contains("myStore1")) {
>>  weAreDoneFunction();
>>  return;
>> }
>> db.startSchemaTransaction().onsuccess = function(event) {
>>  // Have to check again as the objectStore could have been created
>> before the callback fired
>>  if (db.objectStoreNames.contains("myStore1")) {
>>    weAreDoneFunction();
>>    return;
>>  }
>>  trans = event.transaction;
>>  store1 = trans.createObjectStore("myStore1", ...);
>>  store2 = trans.createObjectStore("myStore2", ...);
>>  store1.createIndex(...);
>>  store1.createIndex(...);
>>  store2.createIndex(...);
>>  trans.oncomplete = weAreDoneFunction;
>> }
>>
>>
>> Proposal C:
>> This is like proposal B, however the schema change transaction is
>> started synchronously, same as other transactions (we could even reuse
>> the existing transaction() function, however we'd have to ignore the
>> storeNames argument).
>>
>> Since two different pages, running in different processes, could now
>> start a schema-change transaction at the same time, and thus could
>> call createObjectStore at the same time and attempt to create the same
>> store, we have to keep createObjectStore asynchronous.
>>
>> Code example:
>>
>> if (db.objectStoreNames.contains("myStore1")) {
>>  weAreDoneFunction();
>>  return;
>> }
>> trans = db.startSchemaTransaction();
>> trans.createObjectStore("myStore1", ...);
>> trans.createObjectStore("myStore2", ...).onsuccess = function(event) {
>>  store1 = trans.objectStore("myStore1");
>>  store2 = trans.objectStore("myStore2");
>>  store2 = trans.createObjectStore("myStore2", ...);
>>  store1.createIndex(...);
>>  store1.createIndex(...);
>>  store2.createIndex(...);
>>  trans.oncomplete = weAreDoneFunction;
>> }
>> trans.onerror = function(event) {
>>  if (event.code == ERROR_OBJECT_STORE_ALREADY_EXISTS) {
>>    weAreDoneFunction();
>>  }
>> }
>>
>>
>> Note that in the last example, the onerror case is needed to handle
>> the condition that two pages attempt to create the same objectStore at
>> the same time. Because of this I'm really not a fan of Proposal C as
>> we're effectively relying on the page to handle a race condition.
>>
>> Between the remaining two I think A is clearly simpler for the
>> developer. Yes, it does mean that any schema changes will require
>> closing all other tabs, but I'm not convinced this is a big deal.
>
> To be clear, you mean closing the connections other tabs have, right?
> I've thought a lot about this (since we first started talking about
> setVersion) and I'm no longer convinced that a non-heavy-weight schema
> changing mechanism is important for v1.  I also have some various other
> thoughts:
> I now worry that if we make it easy (or as easy) to do an "advanced" schema
> change, we'll see a lot of apps breaking in weird ways and potentially
> corrupting their own data.  (For example, v1 of the website might use a data
> structure in a way that's incompatible with v2.  When the site changes, v1
> continues using the data structure in the bad way thus corrupting data.)
>  Intentionally breaking pages (until refresh) that aren't designed to handle
> schema changes actually seems like a good thing.
...
> I think options B and C add a lot of API surface area and implementational
> complexity in return for a small improvement for power users but much more
> confusion for normal users.  I think A is by far the simplest and is
> adequate for v1.  We can then look at making something more complex for
> v2--once we have some real-world experience.

It's a very good point that for v1 we can require that people change
the version if they want atomic schema changes. Count me as convinced.

> I'm OK with making createObjectStore/createIndex synchronous.  It would
> definitely make such code cleaner and I don't see a major downside, but at
> the same time I feel like this API is starting to get kind of ad-hoc and
> unintuitive to a new user.  Having the create and the remove functions
> completely different and in different places seems weird.
> So I agree with either A or leaving things as originally proposed by Jonas
> in "[IndexDB] Proposal for async API changes".

Well, there's no reason not to make the same changes to
removeObjectStore/removeIndex. Though the more I think with this
design createObjectStore and removeObjectStore can stay on IDBDatabase
and simply throw if called outside the appropriate transaction
callback.

Here is the revised proposal:

interface IDBDatabase {
    ...
    IDBObjectStore createObjectStore (in DOMString name, in optional
DOMString keyPath, in optional boolean autoIncrement);
    void removeObjectStore (in DOMString storeName);
    ...
};

interface IDBObjectStore {
    ...
    IDBIndex createIndex (in DOMString name, in DOMString keyPath, in
optional boolean unique);
    IDBRequest removeIndex (in DOMString indexName);
    ...
};

Where createObjecStore/createIndex throws if a objectStore or index of
the given name already exists, if the keyPath has an invalid syntax,
or if the function is called when not inside a version-change
transaction callback. And removeObjectStore/removeIndex throws if the
objectStore or index doesn't exist or if the function is called when
not inside a version-change transaction callback.

Throwing if not inside the right type of callback isn't a super clean
solution, but is really not that different from how add/put throws if
called when not inside a READ_WRITE transaction.


> But, I feel pretty strongly that a setVersion/schema change transaction
> should not simply kill off anything else currently running.  The reason is
> that it's not hard for apps to recover from a connection failing, but it is
> hard to handle a transaction failing in an unpredictable way.  Especially
> static transactions (which should rarely fail committing since serialization
> can be guaranteed before the transaction starts).

That might be a good idea for v1. I was planning on doing a separate
thread for setVersion, but maybe it's tied enough to the topic of
schema changes that it makes sense to bring up here.

What I suggest is that when setVersion is called, we fire
'versionchange' event on all other open IDBDatabase objects. This
event contains information of what the desired new version number is.
If no other IDBDatabase objects are open for the specific database, no
'versionchange' events are fired. This allows pages using the old
schema version to automatically save any pending data (for example any
draft emails) and display UI to the user suggesting that the tab be
closed. If possible without dataloss, the tab could even reload itself
to automatically load an updated version of the page which uses the
new schema version.

The 'versionchange' event would use an interface like

interface IDBVersionEvent : IDBEvent {
  readonly attribute string version;
};

Additionally, if there are open IDBDatabase objects, we fire a
'blocked' event at the IDBRequest object returned from the setVersion
call. This allows the page to display UI to the user asking the user
to close all other relevant tabs.

Once all other IDBDatabase objects are closed, we create a transaction
and fire the normal 'success' event.

While there are pending version change requests, no success events are
fired for calls to IDBFactory.open for the relevant database.

We might want to support forcefully killing off open IDBDatabase
objects in the future, but I think that can wait until after v1.

The main risk with this that I can see is if a page is written that
creates two separates IDBDatabase objects to the same database, and
then calls setVersion on the second. This would cause the page to
effectively deadlock itself. If the user closes the tab the version
upgrade is obviously cancelled, and if the page is reopened, it will
simply just end up in the same deadlocked state again. I can't see a
way out of this situation though, so I think we'll have to live with
it.

/ Jonas

Received on Friday, 25 June 2010 08:04:58 UTC