Re: [IndexedDB] Atomic schema changes

On Fri, Jun 25, 2010 at 9:04 AM, Jonas Sicking <jonas@sicking.cc> wrote:

> On Thu, Jun 24, 2010 at 4:32 AM, Jeremy Orlow <jorlow@chromium.org> wrote:
> > On Thu, Jun 24, 2010 at 1:48 AM, Jonas Sicking <jonas@sicking.cc> wrote:
> >>
> >> Hi All,
> >>
> >> In bug 9975 comment 1 [1] Nikunj pointed out that it is unclear how to
> >> make atomic changes to the schema of a database. For example adding an
> >> objectStore and a couple of indexes.
> >>
> >> While it actually currently is possible, it is quite quirky and so I
> >> think we need to find a better solution.
> >>
> >> One way this is already possible is by calling setVersion. When the
> >> success event fires for this request, it contains an implicitly
> >> created transaction which, while it is alive, holds a lock on the
> >> whole database an prevents any other interactions with the database.
> >>
> >> However setVersion is a fairly heavy operation. We have discussed a
> >> couple of different ways it can work, but it seems like there is
> >> agreement that any other open database connections will either have to
> >> be close manually (by for example the user leaving the page), or they
> >> will be close forcefully (by making any requests on them fail). I
> >> intend on sending a starting a separate thread on defining the details
> >> of setVersion.
> >>
> >> We might want to allow making smaller schema changes, such as adding a
> >> new objectStore or a new index, without requiring all other database
> >> connections to be closed. Further, it would be nice if "atomicness"
> >> was a default behavior as to avoid people accidentally creating race
> >> conditions.
> >>
> >> We've talked a bit about this at mozilla and have three alternative
> >> proposals. In all three proposals we suggest moving the
> >> createObjectStore to the Transaction interface (or possibly a new
> >> SchemaTransaction interface). The createIndex function remains on
> >> objectStore, but is defined to throw an exception if called outside a
> >> transaction which allows schema changes.
> >>
> >> Proposal A:
> >> Always require calls to setVersion for changes to the database schema.
> >> The success event fired on the setVersion request is a
> >> IDBTransactionEvent. The author can use the createObjectStore method
> >> on the transaction available on the event to create new object stores.
> >>
> >> Additionally, since we know that no one else currently has an open
> >> database connection, we can make creating objectStores synchronous.
> >> The implementation can probably still asynchronously fail to create an
> >> objectStore, due to diskspace or other hardware issues. This failure
> >> will likely only be detected asynchronously, but can be raised as a
> >> failure to commit the transaction as it is extremely rare.
> >>
> >> The code would look something like:
> >>
> >> if (db.version == "2.0") {
> >>  weAreDoneFunction();
> >> }
> >> db.setVersion("2.0").onsuccess = function(event) {
> >>  trans = event.transaction;
> >>  store1 = trans.createObjectStore("myStore1", ...);
> >>  store2 = trans.createObjectStore("myStore2", ...);
> >>  store1.createIndex(...);
> >>  store1.createIndex(...);
> >>  store2.createIndex(...);
> >>  trans.oncomplete = weAreDoneFunction;
> >> }
> >>
> >> Proposal B:
> >> Add a new type of transaction SCHEMA_CHANGE, in addition to READ and
> >> READ_WRITE. This transaction is required for any schema changes. As
> >> long as the transaction is open, no other schema changes can be done.
> >> The transaction is opened asynchronously using a new
> >> 'startSchemaTransaction' function. This ensures that no other
> >> modifications are attempted at the same time.
> >>
> >> Additionally, since we know that no one else currently is inside a
> >> SCHEMA_CHANGE transaction we can make creating objectStores
> >> synchronous. The implementation can probably still asynchronously fail
> >> to create an objectStore, due to diskspace or other hardware issues.
> >> This failure will likely only be detected asynchronously, but can be
> >> raised as a failure to commit the transaction as it is extremely rare.
> >>
> >> Code example:
> >>
> >> if (db.objectStoreNames.contains("myStore1")) {
> >>  weAreDoneFunction();
> >>  return;
> >> }
> >> db.startSchemaTransaction().onsuccess = function(event) {
> >>  // Have to check again as the objectStore could have been created
> >> before the callback fired
> >>  if (db.objectStoreNames.contains("myStore1")) {
> >>    weAreDoneFunction();
> >>    return;
> >>  }
> >>  trans = event.transaction;
> >>  store1 = trans.createObjectStore("myStore1", ...);
> >>  store2 = trans.createObjectStore("myStore2", ...);
> >>  store1.createIndex(...);
> >>  store1.createIndex(...);
> >>  store2.createIndex(...);
> >>  trans.oncomplete = weAreDoneFunction;
> >> }
> >>
> >>
> >> Proposal C:
> >> This is like proposal B, however the schema change transaction is
> >> started synchronously, same as other transactions (we could even reuse
> >> the existing transaction() function, however we'd have to ignore the
> >> storeNames argument).
> >>
> >> Since two different pages, running in different processes, could now
> >> start a schema-change transaction at the same time, and thus could
> >> call createObjectStore at the same time and attempt to create the same
> >> store, we have to keep createObjectStore asynchronous.
> >>
> >> Code example:
> >>
> >> if (db.objectStoreNames.contains("myStore1")) {
> >>  weAreDoneFunction();
> >>  return;
> >> }
> >> trans = db.startSchemaTransaction();
> >> trans.createObjectStore("myStore1", ...);
> >> trans.createObjectStore("myStore2", ...).onsuccess = function(event) {
> >>  store1 = trans.objectStore("myStore1");
> >>  store2 = trans.objectStore("myStore2");
> >>  store2 = trans.createObjectStore("myStore2", ...);
> >>  store1.createIndex(...);
> >>  store1.createIndex(...);
> >>  store2.createIndex(...);
> >>  trans.oncomplete = weAreDoneFunction;
> >> }
> >> trans.onerror = function(event) {
> >>  if (event.code == ERROR_OBJECT_STORE_ALREADY_EXISTS) {
> >>    weAreDoneFunction();
> >>  }
> >> }
> >>
> >>
> >> Note that in the last example, the onerror case is needed to handle
> >> the condition that two pages attempt to create the same objectStore at
> >> the same time. Because of this I'm really not a fan of Proposal C as
> >> we're effectively relying on the page to handle a race condition.
> >>
> >> Between the remaining two I think A is clearly simpler for the
> >> developer. Yes, it does mean that any schema changes will require
> >> closing all other tabs, but I'm not convinced this is a big deal.
> >
> > To be clear, you mean closing the connections other tabs have, right?
> > I've thought a lot about this (since we first started talking about
> > setVersion) and I'm no longer convinced that a non-heavy-weight schema
> > changing mechanism is important for v1.  I also have some various other
> > thoughts:
> > I now worry that if we make it easy (or as easy) to do an "advanced"
> schema
> > change, we'll see a lot of apps breaking in weird ways and potentially
> > corrupting their own data.  (For example, v1 of the website might use a
> data
> > structure in a way that's incompatible with v2.  When the site changes,
> v1
> > continues using the data structure in the bad way thus corrupting data.)
> >  Intentionally breaking pages (until refresh) that aren't designed to
> handle
> > schema changes actually seems like a good thing.
> ...
> > I think options B and C add a lot of API surface area and
> implementational
> > complexity in return for a small improvement for power users but much
> more
> > confusion for normal users.  I think A is by far the simplest and is
> > adequate for v1.  We can then look at making something more complex for
> > v2--once we have some real-world experience.
>
> It's a very good point that for v1 we can require that people change
> the version if they want atomic schema changes. Count me as convinced.
>

To be clear, I'm not sure we want to provide non-atomic schema changes at
this time either.  But that seems to be consistent with what you describe
below.


> > I'm OK with making createObjectStore/createIndex synchronous.  It would
> > definitely make such code cleaner and I don't see a major downside, but
> at
> > the same time I feel like this API is starting to get kind of ad-hoc and
> > unintuitive to a new user.  Having the create and the remove functions
> > completely different and in different places seems weird.
> > So I agree with either A or leaving things as originally proposed by
> Jonas
> > in "[IndexDB] Proposal for async API changes".
>
> Well, there's no reason not to make the same changes to
> removeObjectStore/removeIndex. Though the more I think with this
> design createObjectStore and removeObjectStore can stay on IDBDatabase
> and simply throw if called outside the appropriate transaction
> callback.
>
> Here is the revised proposal:
>
> interface IDBDatabase {
>    ...
>    IDBObjectStore createObjectStore (in DOMString name, in optional
> DOMString keyPath, in optional boolean autoIncrement);
>    void removeObjectStore (in DOMString storeName);
>    ...
> };
>
> interface IDBObjectStore {
>    ...
>    IDBIndex createIndex (in DOMString name, in DOMString keyPath, in
> optional boolean unique);
>    IDBRequest removeIndex (in DOMString indexName);
>    ...
> };
>
> Where createObjecStore/createIndex throws if a objectStore or index of
> the given name already exists, if the keyPath has an invalid syntax,
> or if the function is called when not inside a version-change
> transaction callback. And removeObjectStore/removeIndex throws if the
> objectStore or index doesn't exist or if the function is called when
> not inside a version-change transaction callback.
>
> Throwing if not inside the right type of callback isn't a super clean
> solution, but is really not that different from how add/put throws if
> called when not inside a READ_WRITE transaction.
>

Just to be clear, no async function (one that returns an IDBRequest) should
_ever_ throw.  (They should only call onerror.)  I assume you didn't mean
"add/put throws" literally?

> But, I feel pretty strongly that a setVersion/schema change transaction
> > should not simply kill off anything else currently running.  The reason
> is
> > that it's not hard for apps to recover from a connection failing, but it
> is
> > hard to handle a transaction failing in an unpredictable way.  Especially
> > static transactions (which should rarely fail committing since
> serialization
> > can be guaranteed before the transaction starts).
>
> That might be a good idea for v1. I was planning on doing a separate
> thread for setVersion, but maybe it's tied enough to the topic of
> schema changes that it makes sense to bring up here.
>
> What I suggest is that when setVersion is called, we fire
> 'versionchange' event on all other open IDBDatabase objects. This
> event contains information of what the desired new version number is.
> If no other IDBDatabase objects are open for the specific database, no
> 'versionchange' events are fired. This allows pages using the old
> schema version to automatically save any pending data (for example any
> draft emails) and display UI to the user suggesting that the tab be
> closed. If possible without dataloss, the tab could even reload itself
> to automatically load an updated version of the page which uses the
> new schema version.
>
> The 'versionchange' event would use an interface like
>
> interface IDBVersionEvent : IDBEvent {
>  readonly attribute string version;
> };
>

First of all, what I was originally advocating (sorry for not being clear)
is that we should kill the database connection but not until all active
transactions are complete.  Though we should probably block new transactions
from starting once setVersion is called.

But I really like your versionchange event idea regardless.  I agree that
letting the app sync any data that might be in memory (for example, a draft
email) is important.  And the idea that the web app could refresh itself (or
download new application code or something) seems pretty cool and useful.
 I'm fine with it firing on all frames except the one that initiated (like
storage events).  If we go with the "kill the connection once all active
transactions are done and block new ones from starting", we'd want to start
the blocking only after all versionchange events have finished.

The main reason that I like the idea of not stating the version change until
all active connections have closed is that not all apps will handle
versionchange.  My original idea was that we should just break such web apps
and let the user refresh, but now that you've pointed out the potential for
data loss I'm not sure that's an option.  Savvy web apps can kill all
existing database connections when they get the versionchange and thus avoid
stalling things.


> Additionally, if there are open IDBDatabase objects, we fire a
> 'blocked' event at the IDBRequest object returned from the setVersion
> call. This allows the page to display UI to the user asking the user
> to close all other relevant tabs.
>
> Once all other IDBDatabase objects are closed, we create a transaction
> and fire the normal 'success' event.
>
> While there are pending version change requests, no success events are
> fired for calls to IDBFactory.open for the relevant database.
>
> We might want to support forcefully killing off open IDBDatabase
> objects in the future, but I think that can wait until after v1.
>

Really?  I can't see an app like gmail ever asking users to close tabs.  I
bet they'd sooner run all the application logic in an iframe and navigate it
away when doing a schema change.

And I don't see many people correctly implementing a blocked event handler.
 If anything, it should be an error code.

It doesn't seem that hard to have an explicit way to tell the database
explicitly "OK, I'm done".  Or, at very least, we could make it so that when
there's an existing setVersion happening, all new connection requests stall.
 That way all pages can reload themselves but they won't connect to the
database until the upgrade is complete.

But really...all of this is really hacky.  I'm starting to wonder if we
should just kill the database connections on a setVersion as I originally
tried to suggest.

The main risk with this that I can see is if a page is written that
> creates two separates IDBDatabase objects to the same database, and
> then calls setVersion on the second. This would cause the page to
> effectively deadlock itself. If the user closes the tab the version
> upgrade is obviously cancelled, and if the page is reopened, it will
> simply just end up in the same deadlocked state again. I can't see a
> way out of this situation though, so I think we'll have to live with
> it.
>

I think we need to try to do better than this.

J

Received on Friday, 25 June 2010 09:44:25 UTC