Re: [IndexedDB] Current editor's draft from Jonas Sicking on 2010-07-09 (public-webapps@w3.org from July to September 2010)

From: Jonas Sicking <jonas@sicking.cc>
Date: Fri, 9 Jul 2010 09:31:25 -0700
To: Andrei Popescu <andreip@google.com>
Cc: Nikunj Mehta <nikunj@o-micron.com>, public-webapps <public-webapps@w3.org>
Message-ID: <AANLkTinU6PUK1QlXJirZw49qFjrHsbz9nQ70HFh7jQFp@mail.gmail.com>
On Fri, Jul 9, 2010 at 3:21 AM, Andrei Popescu <andreip@google.com> wrote:
> On Thu, Jul 8, 2010 at 8:27 PM, Jonas Sicking <jonas@sicking.cc> wrote:
>> On Thu, Jul 8, 2010 at 8:22 AM, Andrei Popescu <andreip@google.com> wrote:
>>> Hi Jonas,
>>>
>>> On Wed, Jul 7, 2010 at 8:08 PM, Jonas Sicking <jonas@sicking.cc> wrote:
>>>> On Wed, Jul 7, 2010 at 10:41 AM, Andrei Popescu <andreip@google.com> wrote:
>>>>>
>>>>>
>>>>> On Wed, Jul 7, 2010 at 8:27 AM, Jonas Sicking <jonas@sicking.cc> wrote:
>>>>>>
>>>>>> On Tue, Jul 6, 2010 at 6:31 PM, Nikunj Mehta <nikunj@o-micron.com> wrote:
>>>>>> > On Wed, Jul 7, 2010 at 5:57 AM, Jonas Sicking <jonas@sicking.cc> wrote:
>>>>>> >>
>>>>>> >> On Tue, Jul 6, 2010 at 9:36 AM, Nikunj Mehta <nikunj@o-micron.com>
>>>>>> >> wrote:
>>>>>> >> > Hi folks,
>>>>>> >> >
>>>>>> >> > There are several unimplemented proposals on strengthening and
>>>>>> >> > expanding IndexedDB. The reason I have not implemented them yet is
>>>>>> >> > because I am not convinced they are necessary in toto. Here's my
>>>>>> >> > attempt at explaining why. I apologize in advance for not responding
>>>>>> >> > to individual proposals due to personal time constraints. I will
>>>>>> >> > however respond in detail on individual bug reports, e.g., as I did
>>>>>> >> > with 9975.
>>>>>> >> >
>>>>>> >> > I used the current editor's draft asynchronous API to understand
>>>>>> >> > where
>>>>>> >> > some of the remaining programming difficulties remain. Based on this
>>>>>> >> > attempt, I find several areas to strengthen, the most prominent of
>>>>>> >> > which is how we use transactions. Another is to add the concept of a
>>>>>> >> > catalog as a special kind of object store.
>>>>>> >>
>>>>>> >> Hi Nikunj,
>>>>>> >>
>>>>>> >> Thanks for replying! I'm very interested in getting this stuff sorted
>>>>>> >> out pretty quickly as almost all other proposals in one way or another
>>>>>> >> are affected by how this stuff develops.
>>>>>> >>
>>>>>> >> > Here are the main areas I propose to address in the editor's spec:
>>>>>> >> >
>>>>>> >> > 1. It is time to separate the dynamic and static scope transaction
>>>>>> >> > creation so that they are asynchronous and synchronous respectively.
>>>>>> >>
>>>>>> >> I don't really understand what this means. What are dynamic and static
>>>>>> >> scope transaction creation? Can you elaborate?
>>>>>> >
>>>>>> > This is the difference in the API in my email between openTransaction
>>>>>> > and
>>>>>> > transaction. Dynamic and static scope have been defined in the spec for
>>>>>> > a
>>>>>> > long time.
>>>>>>
>>>>>
>>>>> In fact, dynamic transactions aren't explicitly specified anywhere. They are
>>>>> just mentioned. You need some amount of guessing to find out what they are
>>>>> or how to create one (i.e. pass an empty list of store names).
>>>>
>>>> Yes, that has been a big problem for us too.
>>>>
>>>>>> Ah, I think I'm following you now. I'm actually not sure that we
>>>>>> should have dynamic scope at all in the spec, I know Jeremy has
>>>>>> expressed similar concerns. However if we are going to have dynamic
>>>>>> scope, I agree it is a good idea to have separate APIs for starting
>>>>>> dynamic-scope transactions from static-scope transactions.
>>>>>>
>>>>>
>>>>> I think it would simplify matters a lot if we were to drop dynamic
>>>>> transactions altogether. And if we do that,  then we can also safely move
>>>>> the 'mode' to parameter to the Transaction interface, since all the object
>>>>> stores in a static transaction can be only be open in the same mode.
>>>>
>>>> Agreed.
>>>>
>>>>>> >> > 2. Provide a catalog object that can be used to atomically add/remove
>>>>>> >> > object stores and indexes as well as modify version.
>>>>>> >>
>>>>>> >> It seems to me that a catalog object doesn't really provide any
>>>>>> >> functionality over the proposal in bug 10052? The advantage that I see
>>>>>> >> with the syntax proposal in bug 10052 is that it is simpler.
>>>>>> >>
>>>>>> >> http://www.w3.org/Bugs/Public/show_bug.cgi?id=10052
>>>>>> >>
>>>>>> >> Can you elaborate on what the advantages are of catalog objects?
>>>>>> >
>>>>>> > To begin with, 10052 shuts down the "users" of the database completely
>>>>>> > when
>>>>>> > only one is changing its structure, i.e., adding or removing an object
>>>>>> > store.
>>>>>>
>>>>>> This is not the case. Check the steps defined for setVersion in [1].
>>>>>> At no point are databases shut down automatically. Only once all
>>>>>> existing database connections are manually closed, either by calls to
>>>>>> IDBDatabase.close() or by the user leaving the page, is the 'success'
>>>>>> event from setVersion fired.
>>>>>>
>>>>>> [1] http://www.w3.org/Bugs/Public/show_bug.cgi?id=10052#c0
>>>>>>
>>>>>> > How can we make it less draconian?
>>>>>>
>>>>>> The 'versionchange' event allows pages that are currently using the
>>>>>> database to handle the change. The page can inspect the new version
>>>>>> number supplied by the 'versionchange' event, and if it knows that it
>>>>>> is compatible with a given upgrade, all it needs to do is to call
>>>>>> db.close() and then immediately reopen the database using
>>>>>> indexedDB.open(). The open call won't complete until the upgrade is
>>>>>> finished.
>>>>>>
>>>>>
>>>>> I had a question here: why does the page need to call 'close'? Any pending
>>>>> transactions will run to completion and new ones should not be allowed to
>>>>> start if a VERSION_CHANGE transaction is waiting to start. From the
>>>>> description of what 'close' does in 10052, I am not entirely sure it is
>>>>> needed.
>>>>
>>>> The problem we're trying to solve is this:
>>>>
>>>> Imagine an editor which stores documents in indexedDB. However in
>>>> order to not overwrite the document using temporary changes, it only
>>>> saves data when the user explicitly requests it, for example by
>>>> pressing a 'save' button.
>>>>
>>>> This means that there can be a bunch of potentially important data
>>>> living outside of indexedDB, in other parts of the application, such
>>>> as in textfields and javascript variables.
>>>>
>>>> If we were to automatically close all other open IDBDatabase objects
>>>> when IDBDatabase.setVersion is closed, that would mean that the
>>>> application by default risks losing data. Without a 'versionchange'
>>>> event, there is little the application can do to ensure that
>>>> IDBDatabase object is closed under it, preventing it from saving data
>>>> that lived outside the database. So while run-to-completion and
>>>> allowing existing transactions to finish ensures that the database
>>>> stays in a consistent state, it does not prevent dataloss on an
>>>> application level.
>>>>
>>>> But even with the 'versionchange' event, we're only solving part of
>>>> the problem. First of all it requires that people listen to it and act
>>>> appropriately by saving information in indexedDB. Second, it requires
>>>> that the application is able to synchronously store all the data in
>>>> indexedDB before returning from the 'versionchange' event handler.
>>>>
>>>> By instead not closing databases automatically, we do the safe thing
>>>> by default. And by adding the IDBDatabase.close() function, we allow
>>>> pages to asynchronously interact with the user to ask the user if he
>>>> wants to save the data. Or to perform other asynchronous actions as
>>>> part of saving the data. Once all data has been saved, the application
>>>> can call IDBDatabase.close().
>>>>
>>>> Alternatively, and by default, the setVersion callback simply won't
>>>> fire until all other IDBDatabase connections are closed by the user
>>>> closing other tabs.
>>>>
>>>
>>> Ah, I see now, thanks for the explanation.
>>>
>>> My thinking was that the "VERSION_CHANGE" transaction would simply
>>> kick in like any other transaction. However, you're right: that would
>>> not work if the application would not be able to save its state
>>> immediately in the "versionchange" handler. By the time the user
>>> confirms the changes, the VERSION_CHANGE transaction could have kicked
>>> in and changed the schema.
>>
>> Exactly.
>>
>>>>>> > Secondly, I don't see how that
>>>>>> > approach can produce atomic changes to the database.
>>>>>>
>>>>>> When the transaction created in step 4 of setVersion defined in [1] is
>>>>>> created, only one IDBDatabase object to the database is open. As long
>>>>>> as that transaction is running, no requests returned from
>>>>>> IDBFactory.open will receive a 'success' event. Only once the
>>>>>> transaction is committed, or aborted, will those requests succeed.
>>>>>> This guarantees that no other IDBDatabase object can see a partial
>>>>>> update.
>>>>>>
>>>>>> Further, only once the transaction created by setVersion is committed,
>>>>>> are the requested objectStores and indexes created/removed. This
>>>>>> guarantees that the database is never left with a partial update.
>>>>>>
>>>>>> That means that the changes are atomic, right?
>>>>>>
>>>>>
>>>>> Sounds atomic IMHO.
>>>>
>>>> One thing that I forgot to mention. If the setVersion transaction is
>>>> aborted for some reason, for example due to disk errors or due to
>>>> explicit calls to .abort(), all changes are rolled back, including all
>>>> calls to createObjectStore/createIndex/removeObjectStore/removeIndex.
>>>>
>>>>>> > Thirdly, we shouldn't
>>>>>> > need to change version in order to perform database changes.
>>>>>>
>>>>>> First off, note that if the upgrade is compatible, you can just pass
>>>>>> the existing database version to setVersion. So no version *change* is
>>>>>> actually needed.
>>>>>>
>>>>>> Second, I don't think there is much difference between
>>>>>>
>>>>>> var txn = db.transaction();
>>>>>> db.openCatalog(txn).onsuccess = ...
>>>>>>
>>>>>> vs
>>>>>>
>>>>>> db.setVersion("5").onsuccess = ...
>>>>>>
>>>>>> I don't see that telling people that they have to use the former is a big
>>>>>> win.
>>>>>>
>>>>>>
>>>>>> The problem that I see with the catalog proposal, if I understand it
>>>>>> correctly, is that it means that a page that has a IDBDatabase object
>>>>>> open has to always be prepared for calls to
>>>>>> openObjectStore/openTransaction failing. I.e. the page can't ever know
>>>>>> that another page was opened which at any point created a catalog and
>>>>>> removed an objectStore. This forces pages to at every single call
>>>>>> either check that the version is still the same, or that each and
>>>>>> every call to openObjectStore/openTransaction succeeds. This seems
>>>>>> very error prone to me.
>>>>>>
>>>>>
>>>>> I guess you could pass an event ("schemachange") as soon as the catalog
>>>>> transaction is committed? This somewhat similar to the "versionchange" event
>>>>> fired by setVersion(). At the same time, if setVersion() can be used to
>>>>> provide atomic schema changes, I don't see what the catalog buys us.
>>>>
>>>> Agreed. And even a "schemachange" event requires that people listen to
>>>> it. I.e. it's unsafe by default, and only becomes safe once pages take
>>>> explicit action.
>>>>
>>>>>> Looking at your example, it also seems like it contains a race
>>>>>> condition. There is a risk that when someone opens a database, the
>>>>>> first transaction, which uses a catalog to create the necessary
>>>>>> objectStores and indexes, is committed, but the second transaction,
>>>>>> which populates the objectStores with data, has not yet started.
>>>>>>
>>>>>> > Finally, I am
>>>>>> > not sure why you consider the syntax proposal simpler. Note that I am
>>>>>> > not
>>>>>> > averse to the version change event notification.
>>>>>>
>>>>>> Compare to how your code would look like with the proposals in bugs
>>>>>> 9975 and 10052:
>>>>>>
>>>>>> var db;
>>>>>> var dbRequest = indexedDB.open("parts", 'Part database');
>>>>>> dbRequest.onsuccess = function(event) {
>>>>>>  db = event.result;
>>>>>>  if (db.version != "1") {
>>>>>>    versionRequest = db.setVersion("1");
>>>>>>    versionRequest.ontimeout = function(event) {
>>>>>>      throw new Error("timeout while waiting for other DBs to close");
>>>>>>    };
>>>>>>    versionRequest.onsuccess = function(event) {
>>>>>>      event.transaction.onerror = function(event) {
>>>>>>        throw new Error("Failed to set up database due to error: " +
>>>>>> event.message);
>>>>>>      };
>>>>>>      db.setVersion("1");
>>>>>
>>>>>
>>>>> Do you need to call setVersion again?
>>>>
>>>> Oops, my bad, please remove that line.
>>>>
>>>>>> >> One of our main points was to make getting objectStore
>>>>>> >> objects a synchronous operation as to avoid having to nest multiple
>>>>>> >> levels of asynchronous calls. Compare
>>>>>> >>
>>>>>> >> var req = db.openObjectStore("foo", trans);
>>>>>> >> req.onerror = errorHandler;
>>>>>> >> req.onsuccess = function(e) {
>>>>>> >>  var fooStore = e.result;
>>>>>> >>  var req = fooStore.get(12);
>>>>>> >>  req.onerror = errorHandler;
>>>>>> >>  req.onsuccess = resultHandler;
>>>>>> >> }
>>>>>> >>
>>>>>> >> to
>>>>>> >>
>>>>>> >> var fooStore = db.openObjectStore("foo", trans);
>>>>>> >> var req = fooStore.get(12);
>>>>>> >> req.onerror = errorHandler;
>>>>>> >> req.onsuccess = resultHandler;
>>>>>> >>
>>>>>> >>
>>>>>> >> I also don't understand the advantage of having the transaction as an
>>>>>> >> argument to openObjectStore rather than having openObjectStore live on
>>>>>> >> transaction. Compare
>>>>>> >>
>>>>>> >> db.openObjectStore("foo", trans);
>>>>>> >>
>>>>>> >> to
>>>>>> >>
>>>>>> >> trans.openObjectStore("foo");
>>>>>> >>
>>>>>> >> I also don't understand the meaning of specifying a mode when a
>>>>>> >> objectStore is opened, rather than specifying the mode when the
>>>>>> >> transaction is created.
>>>>>> >
>>>>>> > Have you reviewed the examples? Different object stores in a transaction
>>>>>> > are
>>>>>> > used in different modes, and that requires us to identify the mode when
>>>>>> > opening the object store. This also increases concurrency. This is
>>>>>> > particularly useful for dynamic transactions.
>>>>>>
>>>>>> I'm following you better now. I do see how this can work for dynamic
>>>>>> transactions where locks are not acquired upon creation of the
>>>>>> transaction. But I don't see how this makes sense for static
>>>>>> transactions. And it indeed seems like you are not using this feature
>>>>>> for static transactions.
>>>>>>
>>>>>
>>>>> I don't think it's even possible with the current API since
>>>>> openTransaction() takes a list of objectStore names but a single mode.
>>>>
>>>> Indeed. We could allow static transactions to use different lock
>>>> levels for different objectStores, all specified when the
>>>> IDBTransaction is initially created. It's just a matter of syntax to
>>>> the IDBDatabase.transaction() function. However so far I've thought
>>>> that we should leave this for v2. But if people would feel more easy
>>>> about dropping dynamic transactions if we add this ability, then I
>>>> would be ok with it.
>>>>
>>>>>> If it is the case that specifying a mode when opening an objectStore
>>>>>> only makes sense on dynamic transactions, then I think we should only
>>>>>> expose that argument on dynamic transactions.
>>>>>>
>>>>>> Now that I understand your proposal better, I don't understand how
>>>>>> IDBTransaction.objectStore works for dynamically scoped transactions
>>>>>> in your proposal. It seems to require synchronously grabbing a lock
>>>>>> which I thought we wanted to avoid at all cost.
>>>>>>
>>>>>
>>>>> This is rather confusing: is IDBTransaction::objectStore() creating an
>>>>> object store, now? If yes, then it must lock it synchronously. If it just
>>>>> returns an object store that was previously added to the transaction, what
>>>>> is the 'mode' parameter for?
>>>>
>>>> Looking at Nikunj's example code, it seems like you can request a new
>>>> objectStore to be locked using IDBTransaction.objectStore(). Not sure
>>>> if that is a bug or not though?
>>>>
>>>
>>> Not sure either. I see the following examples:
>>>
>>> 1)
>>> var txnRequest = db.openTransaction(); //requesting to lock the entire
>>> database exclusively
>>> txnRequest.onsuccess = function(event) {
>>> var txn = txnRequest.transaction;
>>> var part = txn.objectStore("part", IDBObjectStore.READ_WRITE);
>>>
>>> So by the time txnRequest.onsuccess runs, the entire database is
>>> exclusively locked. I don't then follow why we pass READ_WRITE to
>>> txn.objectStore()...we already have a lock on that store.
>>>
>>> 2)
>>> var txn = db.transaction(); //synchronous because requesting locks as I go along
>>> var parts = txn.objectStore("part", IDBObjectStore.READ_ONLY);
>>>
>>> 'parts' is an IDBObjectStore, not an IDBRequest. So is 'parts' locked
>>> at this point? If it is, then objectStore must've synchronously
>>> acquired that lock, as you said. If it isn't, then the following call
>>> to 'parts.get()' must first acquire the lock. So it's either
>>> unreasonable or confusing.
>>>
>>> Anyway, I see the point the example is trying to make (i.e. using
>>> object stores in different modes inside the same transaction). I think
>>> we can live (for v1) with solving such examples with a READ_WRITE
>>> static transaction and potentially losing some concurrency.
>>
>> I think that is ok too. Though like I said, if it allows makes people
>> more comfortable with dropping dynamic transactions, then I'm fine
>> with allowing more granular control for static transactions.
>>
>>>>>> >> Unless we're planning on making all
>>>>>> >> transactions dynamic (I hope not), locks have to be grabbed when the
>>>>>> >> transaction is created, right? If a transaction is holding a READ_ONLY
>>>>>> >> lock for a given objectStore, then attempting to open that objectStore
>>>>>> >> as READ_WRITE should obviously fail. Consecutively, if a transaction
>>>>>> >> is holding a READ_WRITE lock for a given objectStore, then opening
>>>>>> >> that objectStore as READ_ONLY doesn't seem to have any benefit over
>>>>>> >> opening it as READ_WRITE. In short, I can't see any situation when
>>>>>> >> you'd want to open an objectStore in a different mode than what was
>>>>>> >> used when the transaction was created.
>>>>>> >>
>>>>>> >> Finally, I would stronly prefer to have READ_ONLY be the default
>>>>>> >> transaction type if none is specified by the author. It is better to
>>>>>> >> default to what results in higher performance, and have people notice
>>>>>> >> when they need to switch to the slower mode. This is because people
>>>>>> >> will very quickly notice if they use READ_ONLY when they need to use
>>>>>> >> READ_WRITE, since the code will never work. However if people use
>>>>>> >> READ_WRITE when all they need is READ_ONLY, then the only effect is
>>>>>> >> likely to be an application that runs somewhat slower, when they will
>>>>>> >> unlikely detect.
>>>>>> >
>>>>>> > This approach is also likely to cause exceptions upon put, remove, and
>>>>>> > add.
>>>>>> > I would prefer to not cause exceptions as the default behavior.
>>>>>>
>>>>>> If we use READ_WRITE as default behavior then it's extremely likely
>>>>>> that people will use the wrong lock type and not realize. The downside
>>>>>> will be that sites will run less concurrently, and thus slower, than
>>>>>> they could. Another downside is that authors should specify lock-type
>>>>>> more often, for optimal performance, if we think that READ_ONLY is
>>>>>> more common.
>>>>>>
>>>>>> If we are using READ_ONLY as default behavior, then it's extremely
>>>>>> likely that people will use the wrong lock type, notice that their
>>>>>> code isn't working, and fix it. The downside is that people will have
>>>>>> to fix their bugs. Another downside is that authors will have to
>>>>>> specify lock-type more often if we think that READ_WRITE is more
>>>>>> common.
>>>>>>
>>>>>> To me the downsides of using READ_WRITE as a default are much worse
>>>>>> than the downsides of using READ_ONLY.
>>>>
>>>> Andrei: would be great to hear your opinion on this.
>>>>
>>>
>>> It seems very wasteful to use READ_WRITE as the default for the
>>> reasons you mentioned. The downside of having write operations throw
>>> with the default transaction mode can easily be overcome if the
>>> exception message is clear and tells people what they have to do. On
>>> the other hand, tuning application performance by scanning your code
>>> to see if you could turn some transactions to READ_ONLY seems rather
>>> painful.
>>
>> My feeling exactly.
>>
>> There seems to be much agreement on the proposals in bug 9975 and
>> 10052. Most of the disagreements seems to be stemming from
>> misunderstandings how the proposal works, such as weather atomic
>> schema transactions are possible, and if the synchronous transaction
>> API requires synchronously grabbing locks.
>>
>> I propose that we go ahead and implement in the specification the
>> changes proposed in bug 9975 and 10052 and use that as basis for
>> further discussions. We've already found a few issues in those
>> proposals, such as the lack of support for dynamic transactions (if we
>> want to keep those), but I think it will be easier to discuss the
>> changes that are needed in terms of changes needed, rather than the
>> significantly different API that is in the draft now.
>>
>
> Agreed. I think your proposal in 9975 should be the basis for the v1
> of this spec.

Just to be clear. I expect that we'll need to do significant changes
even on top of our proposal. I'm sure there are things that we have
missed, and that there are things that can be done better than the way
we've proposed. However it seems like it's closer to what we want than
what we currently have, and so we should use it as basis for further
discussions.

>> If this sounds ok to everyone, I can write these changes up monday.
>>
>
> Sounds ok, I will try to help you with this.

Awesome!

/ Jonas
Received on Friday, 9 July 2010 16:32:25 UTC