Re: [IndexedDB] Atomic schema changes from Mikeal Rogers on 2010-06-25 (public-webapps@w3.org from April to June 2010)

From: Mikeal Rogers <mikeal.rogers@gmail.com>
Date: Fri, 25 Jun 2010 14:55:55 -0700
To: Jeremy Orlow <jorlow@chromium.org>
Cc: Jonas Sicking <jonas@sicking.cc>, Webapps WG <public-webapps@w3.org>
Message-ID: <AANLkTill2pc5ajMmd5AF7PLkuGAU82V6sjz_wyABq7x1@mail.gmail.com>
In IDBCouch I don't have a "schema" but I do have to maintain
consistency of the by-sequence index which is a similar problem to
validating schema state before these kinds of operations.

What I'm currently doing is just starting each write transaction with
a lookup to the end of the by-sequence index to make sure the
lastSequence I have is, in fact, the current one and another
tab/window hasn't updated it.

My plan for view generation is a similar problem and I plan to solve
it with a an objectStore of meta information about all of the views.
Storing the last known sequence and conflict resolution information
about replicas is also a similar problem and I'll solve it the same
way with a meta objectStore.

I don't see why schema information couldn't also be stored in a meta
objectStore at the end transactions that modify it and all of these
higher level APIs could just start their transaction with a validation
of the meta info. Rather than trying to keep the information globally
and updating it with an event you can just validate it at the
beginning of each transaction. The overhead is minimal and it seems,
to me at least, to be a little less error prone.

-Mikeal

On Fri, Jun 25, 2010 at 2:43 AM, Jeremy Orlow <jorlow@chromium.org> wrote:
> On Fri, Jun 25, 2010 at 9:04 AM, Jonas Sicking <jonas@sicking.cc> wrote:
>>
>> On Thu, Jun 24, 2010 at 4:32 AM, Jeremy Orlow <jorlow@chromium.org> wrote:
>> > On Thu, Jun 24, 2010 at 1:48 AM, Jonas Sicking <jonas@sicking.cc> wrote:
>> >>
>> >> Hi All,
>> >>
>> >> In bug 9975 comment 1 [1] Nikunj pointed out that it is unclear how to
>> >> make atomic changes to the schema of a database. For example adding an
>> >> objectStore and a couple of indexes.
>> >>
>> >> While it actually currently is possible, it is quite quirky and so I
>> >> think we need to find a better solution.
>> >>
>> >> One way this is already possible is by calling setVersion. When the
>> >> success event fires for this request, it contains an implicitly
>> >> created transaction which, while it is alive, holds a lock on the
>> >> whole database an prevents any other interactions with the database.
>> >>
>> >> However setVersion is a fairly heavy operation. We have discussed a
>> >> couple of different ways it can work, but it seems like there is
>> >> agreement that any other open database connections will either have to
>> >> be close manually (by for example the user leaving the page), or they
>> >> will be close forcefully (by making any requests on them fail). I
>> >> intend on sending a starting a separate thread on defining the details
>> >> of setVersion.
>> >>
>> >> We might want to allow making smaller schema changes, such as adding a
>> >> new objectStore or a new index, without requiring all other database
>> >> connections to be closed. Further, it would be nice if "atomicness"
>> >> was a default behavior as to avoid people accidentally creating race
>> >> conditions.
>> >>
>> >> We've talked a bit about this at mozilla and have three alternative
>> >> proposals. In all three proposals we suggest moving the
>> >> createObjectStore to the Transaction interface (or possibly a new
>> >> SchemaTransaction interface). The createIndex function remains on
>> >> objectStore, but is defined to throw an exception if called outside a
>> >> transaction which allows schema changes.
>> >>
>> >> Proposal A:
>> >> Always require calls to setVersion for changes to the database schema.
>> >> The success event fired on the setVersion request is a
>> >> IDBTransactionEvent. The author can use the createObjectStore method
>> >> on the transaction available on the event to create new object stores.
>> >>
>> >> Additionally, since we know that no one else currently has an open
>> >> database connection, we can make creating objectStores synchronous.
>> >> The implementation can probably still asynchronously fail to create an
>> >> objectStore, due to diskspace or other hardware issues. This failure
>> >> will likely only be detected asynchronously, but can be raised as a
>> >> failure to commit the transaction as it is extremely rare.
>> >>
>> >> The code would look something like:
>> >>
>> >> if (db.version == "2.0") {
>> >>  weAreDoneFunction();
>> >> }
>> >> db.setVersion("2.0").onsuccess = function(event) {
>> >>  trans = event.transaction;
>> >>  store1 = trans.createObjectStore("myStore1", ...);
>> >>  store2 = trans.createObjectStore("myStore2", ...);
>> >>  store1.createIndex(...);
>> >>  store1.createIndex(...);
>> >>  store2.createIndex(...);
>> >>  trans.oncomplete = weAreDoneFunction;
>> >> }
>> >>
>> >> Proposal B:
>> >> Add a new type of transaction SCHEMA_CHANGE, in addition to READ and
>> >> READ_WRITE. This transaction is required for any schema changes. As
>> >> long as the transaction is open, no other schema changes can be done.
>> >> The transaction is opened asynchronously using a new
>> >> 'startSchemaTransaction' function. This ensures that no other
>> >> modifications are attempted at the same time.
>> >>
>> >> Additionally, since we know that no one else currently is inside a
>> >> SCHEMA_CHANGE transaction we can make creating objectStores
>> >> synchronous. The implementation can probably still asynchronously fail
>> >> to create an objectStore, due to diskspace or other hardware issues.
>> >> This failure will likely only be detected asynchronously, but can be
>> >> raised as a failure to commit the transaction as it is extremely rare.
>> >>
>> >> Code example:
>> >>
>> >> if (db.objectStoreNames.contains("myStore1")) {
>> >>  weAreDoneFunction();
>> >>  return;
>> >> }
>> >> db.startSchemaTransaction().onsuccess = function(event) {
>> >>  // Have to check again as the objectStore could have been created
>> >> before the callback fired
>> >>  if (db.objectStoreNames.contains("myStore1")) {
>> >>    weAreDoneFunction();
>> >>    return;
>> >>  }
>> >>  trans = event.transaction;
>> >>  store1 = trans.createObjectStore("myStore1", ...);
>> >>  store2 = trans.createObjectStore("myStore2", ...);
>> >>  store1.createIndex(...);
>> >>  store1.createIndex(...);
>> >>  store2.createIndex(...);
>> >>  trans.oncomplete = weAreDoneFunction;
>> >> }
>> >>
>> >>
>> >> Proposal C:
>> >> This is like proposal B, however the schema change transaction is
>> >> started synchronously, same as other transactions (we could even reuse
>> >> the existing transaction() function, however we'd have to ignore the
>> >> storeNames argument).
>> >>
>> >> Since two different pages, running in different processes, could now
>> >> start a schema-change transaction at the same time, and thus could
>> >> call createObjectStore at the same time and attempt to create the same
>> >> store, we have to keep createObjectStore asynchronous.
>> >>
>> >> Code example:
>> >>
>> >> if (db.objectStoreNames.contains("myStore1")) {
>> >>  weAreDoneFunction();
>> >>  return;
>> >> }
>> >> trans = db.startSchemaTransaction();
>> >> trans.createObjectStore("myStore1", ...);
>> >> trans.createObjectStore("myStore2", ...).onsuccess = function(event) {
>> >>  store1 = trans.objectStore("myStore1");
>> >>  store2 = trans.objectStore("myStore2");
>> >>  store2 = trans.createObjectStore("myStore2", ...);
>> >>  store1.createIndex(...);
>> >>  store1.createIndex(...);
>> >>  store2.createIndex(...);
>> >>  trans.oncomplete = weAreDoneFunction;
>> >> }
>> >> trans.onerror = function(event) {
>> >>  if (event.code == ERROR_OBJECT_STORE_ALREADY_EXISTS) {
>> >>    weAreDoneFunction();
>> >>  }
>> >> }
>> >>
>> >>
>> >> Note that in the last example, the onerror case is needed to handle
>> >> the condition that two pages attempt to create the same objectStore at
>> >> the same time. Because of this I'm really not a fan of Proposal C as
>> >> we're effectively relying on the page to handle a race condition.
>> >>
>> >> Between the remaining two I think A is clearly simpler for the
>> >> developer. Yes, it does mean that any schema changes will require
>> >> closing all other tabs, but I'm not convinced this is a big deal.
>> >
>> > To be clear, you mean closing the connections other tabs have, right?
>> > I've thought a lot about this (since we first started talking about
>> > setVersion) and I'm no longer convinced that a non-heavy-weight schema
>> > changing mechanism is important for v1.  I also have some various other
>> > thoughts:
>> > I now worry that if we make it easy (or as easy) to do an "advanced"
>> > schema
>> > change, we'll see a lot of apps breaking in weird ways and potentially
>> > corrupting their own data.  (For example, v1 of the website might use a
>> > data
>> > structure in a way that's incompatible with v2.  When the site changes,
>> > v1
>> > continues using the data structure in the bad way thus corrupting data.)
>> >  Intentionally breaking pages (until refresh) that aren't designed to
>> > handle
>> > schema changes actually seems like a good thing.
>> ...
>> > I think options B and C add a lot of API surface area and
>> > implementational
>> > complexity in return for a small improvement for power users but much
>> > more
>> > confusion for normal users.  I think A is by far the simplest and is
>> > adequate for v1.  We can then look at making something more complex for
>> > v2--once we have some real-world experience.
>>
>> It's a very good point that for v1 we can require that people change
>> the version if they want atomic schema changes. Count me as convinced.
>
> To be clear, I'm not sure we want to provide non-atomic schema changes at
> this time either.  But that seems to be consistent with what you describe
> below.
>
>>
>> > I'm OK with making createObjectStore/createIndex synchronous.  It would
>> > definitely make such code cleaner and I don't see a major downside, but
>> > at
>> > the same time I feel like this API is starting to get kind of ad-hoc and
>> > unintuitive to a new user.  Having the create and the remove functions
>> > completely different and in different places seems weird.
>> > So I agree with either A or leaving things as originally proposed by
>> > Jonas
>> > in "[IndexDB] Proposal for async API changes".
>>
>> Well, there's no reason not to make the same changes to
>> removeObjectStore/removeIndex. Though the more I think with this
>> design createObjectStore and removeObjectStore can stay on IDBDatabase
>> and simply throw if called outside the appropriate transaction
>> callback.
>>
>> Here is the revised proposal:
>>
>> interface IDBDatabase {
>>    ...
>>    IDBObjectStore createObjectStore (in DOMString name, in optional
>> DOMString keyPath, in optional boolean autoIncrement);
>>    void removeObjectStore (in DOMString storeName);
>>    ...
>> };
>>
>> interface IDBObjectStore {
>>    ...
>>    IDBIndex createIndex (in DOMString name, in DOMString keyPath, in
>> optional boolean unique);
>>    IDBRequest removeIndex (in DOMString indexName);
>>    ...
>> };
>>
>> Where createObjecStore/createIndex throws if a objectStore or index of
>> the given name already exists, if the keyPath has an invalid syntax,
>> or if the function is called when not inside a version-change
>> transaction callback. And removeObjectStore/removeIndex throws if the
>> objectStore or index doesn't exist or if the function is called when
>> not inside a version-change transaction callback.
>>
>> Throwing if not inside the right type of callback isn't a super clean
>> solution, but is really not that different from how add/put throws if
>> called when not inside a READ_WRITE transaction.
>
> Just to be clear, no async function (one that returns an IDBRequest) should
> _ever_ throw.  (They should only call onerror.)  I assume you didn't mean
> "add/put throws" literally?
>>
>> > But, I feel pretty strongly that a setVersion/schema change transaction
>> > should not simply kill off anything else currently running.  The reason
>> > is
>> > that it's not hard for apps to recover from a connection failing, but it
>> > is
>> > hard to handle a transaction failing in an unpredictable way.
>> >  Especially
>> > static transactions (which should rarely fail committing since
>> > serialization
>> > can be guaranteed before the transaction starts).
>>
>> That might be a good idea for v1. I was planning on doing a separate
>> thread for setVersion, but maybe it's tied enough to the topic of
>> schema changes that it makes sense to bring up here.
>>
>> What I suggest is that when setVersion is called, we fire
>> 'versionchange' event on all other open IDBDatabase objects. This
>> event contains information of what the desired new version number is.
>> If no other IDBDatabase objects are open for the specific database, no
>> 'versionchange' events are fired. This allows pages using the old
>> schema version to automatically save any pending data (for example any
>> draft emails) and display UI to the user suggesting that the tab be
>> closed. If possible without dataloss, the tab could even reload itself
>> to automatically load an updated version of the page which uses the
>> new schema version.
>>
>> The 'versionchange' event would use an interface like
>>
>> interface IDBVersionEvent : IDBEvent {
>>  readonly attribute string version;
>> };
>
> First of all, what I was originally advocating (sorry for not being clear)
> is that we should kill the database connection but not until all active
> transactions are complete.  Though we should probably block new transactions
> from starting once setVersion is called.
> But I really like your versionchange event idea regardless.  I agree that
> letting the app sync any data that might be in memory (for example, a draft
> email) is important.  And the idea that the web app could refresh itself (or
> download new application code or something) seems pretty cool and useful.
>  I'm fine with it firing on all frames except the one that initiated (like
> storage events).  If we go with the "kill the connection once all active
> transactions are done and block new ones from starting", we'd want to start
> the blocking only after all versionchange events have finished.
> The main reason that I like the idea of not stating the version change until
> all active connections have closed is that not all apps will handle
> versionchange.  My original idea was that we should just break such web apps
> and let the user refresh, but now that you've pointed out the potential for
> data loss I'm not sure that's an option.  Savvy web apps can kill all
> existing database connections when they get the versionchange and thus avoid
> stalling things.
>
>>
>> Additionally, if there are open IDBDatabase objects, we fire a
>> 'blocked' event at the IDBRequest object returned from the setVersion
>> call. This allows the page to display UI to the user asking the user
>> to close all other relevant tabs.
>>
>> Once all other IDBDatabase objects are closed, we create a transaction
>> and fire the normal 'success' event.
>>
>> While there are pending version change requests, no success events are
>> fired for calls to IDBFactory.open for the relevant database.
>>
>> We might want to support forcefully killing off open IDBDatabase
>> objects in the future, but I think that can wait until after v1.
>
> Really?  I can't see an app like gmail ever asking users to close tabs.  I
> bet they'd sooner run all the application logic in an iframe and navigate it
> away when doing a schema change.
> And I don't see many people correctly implementing a blocked event handler.
>  If anything, it should be an error code.
> It doesn't seem that hard to have an explicit way to tell the database
> explicitly "OK, I'm done".  Or, at very least, we could make it so that when
> there's an existing setVersion happening, all new connection requests stall.
>  That way all pages can reload themselves but they won't connect to the
> database until the upgrade is complete.
> But really...all of this is really hacky.  I'm starting to wonder if we
> should just kill the database connections on a setVersion as I originally
> tried to suggest.
>>
>> The main risk with this that I can see is if a page is written that
>> creates two separates IDBDatabase objects to the same database, and
>> then calls setVersion on the second. This would cause the page to
>> effectively deadlock itself. If the user closes the tab the version
>> upgrade is obviously cancelled, and if the page is reopened, it will
>> simply just end up in the same deadlocked state again. I can't see a
>> way out of this situation though, so I think we'll have to live with
>> it.
>
> I think we need to try to do better than this.
> J
Received on Friday, 25 June 2010 21:56:23 UTC