Re: [IndexedDB] Current editor's draft from Andrei Popescu on 2010-07-09 (public-webapps@w3.org from July to September 2010)

From: Andrei Popescu <andreip@google.com>
Date: Fri, 9 Jul 2010 11:21:35 +0100
To: Jonas Sicking <jonas@sicking.cc>
Cc: Nikunj Mehta <nikunj@o-micron.com>, public-webapps <public-webapps@w3.org>
Message-ID: <AANLkTim6JaxQ36oBvnaYn_yYNL6Uz6-rkchl1Py8ovUw@mail.gmail.com>
On Thu, Jul 8, 2010 at 8:27 PM, Jonas Sicking <jonas@sicking.cc> wrote:
> On Thu, Jul 8, 2010 at 8:22 AM, Andrei Popescu <andreip@google.com> wrote:
>> Hi Jonas,
>>
>> On Wed, Jul 7, 2010 at 8:08 PM, Jonas Sicking <jonas@sicking.cc> wrote:
>>> On Wed, Jul 7, 2010 at 10:41 AM, Andrei Popescu <andreip@google.com> wrote:
>>>>
>>>>
>>>> On Wed, Jul 7, 2010 at 8:27 AM, Jonas Sicking <jonas@sicking.cc> wrote:
>>>>>
>>>>> On Tue, Jul 6, 2010 at 6:31 PM, Nikunj Mehta <nikunj@o-micron.com> wrote:
>>>>> > On Wed, Jul 7, 2010 at 5:57 AM, Jonas Sicking <jonas@sicking.cc> wrote:
>>>>> >>
>>>>> >> On Tue, Jul 6, 2010 at 9:36 AM, Nikunj Mehta <nikunj@o-micron.com>
>>>>> >> wrote:
>>>>> >> > Hi folks,
>>>>> >> >
>>>>> >> > There are several unimplemented proposals on strengthening and
>>>>> >> > expanding IndexedDB. The reason I have not implemented them yet is
>>>>> >> > because I am not convinced they are necessary in toto. Here's my
>>>>> >> > attempt at explaining why. I apologize in advance for not responding
>>>>> >> > to individual proposals due to personal time constraints. I will
>>>>> >> > however respond in detail on individual bug reports, e.g., as I did
>>>>> >> > with 9975.
>>>>> >> >
>>>>> >> > I used the current editor's draft asynchronous API to understand
>>>>> >> > where
>>>>> >> > some of the remaining programming difficulties remain. Based on this
>>>>> >> > attempt, I find several areas to strengthen, the most prominent of
>>>>> >> > which is how we use transactions. Another is to add the concept of a
>>>>> >> > catalog as a special kind of object store.
>>>>> >>
>>>>> >> Hi Nikunj,
>>>>> >>
>>>>> >> Thanks for replying! I'm very interested in getting this stuff sorted
>>>>> >> out pretty quickly as almost all other proposals in one way or another
>>>>> >> are affected by how this stuff develops.
>>>>> >>
>>>>> >> > Here are the main areas I propose to address in the editor's spec:
>>>>> >> >
>>>>> >> > 1. It is time to separate the dynamic and static scope transaction
>>>>> >> > creation so that they are asynchronous and synchronous respectively.
>>>>> >>
>>>>> >> I don't really understand what this means. What are dynamic and static
>>>>> >> scope transaction creation? Can you elaborate?
>>>>> >
>>>>> > This is the difference in the API in my email between openTransaction
>>>>> > and
>>>>> > transaction. Dynamic and static scope have been defined in the spec for
>>>>> > a
>>>>> > long time.
>>>>>
>>>>
>>>> In fact, dynamic transactions aren't explicitly specified anywhere. They are
>>>> just mentioned. You need some amount of guessing to find out what they are
>>>> or how to create one (i.e. pass an empty list of store names).
>>>
>>> Yes, that has been a big problem for us too.
>>>
>>>>> Ah, I think I'm following you now. I'm actually not sure that we
>>>>> should have dynamic scope at all in the spec, I know Jeremy has
>>>>> expressed similar concerns. However if we are going to have dynamic
>>>>> scope, I agree it is a good idea to have separate APIs for starting
>>>>> dynamic-scope transactions from static-scope transactions.
>>>>>
>>>>
>>>> I think it would simplify matters a lot if we were to drop dynamic
>>>> transactions altogether. And if we do that,  then we can also safely move
>>>> the 'mode' to parameter to the Transaction interface, since all the object
>>>> stores in a static transaction can be only be open in the same mode.
>>>
>>> Agreed.
>>>
>>>>> >> > 2. Provide a catalog object that can be used to atomically add/remove
>>>>> >> > object stores and indexes as well as modify version.
>>>>> >>
>>>>> >> It seems to me that a catalog object doesn't really provide any
>>>>> >> functionality over the proposal in bug 10052? The advantage that I see
>>>>> >> with the syntax proposal in bug 10052 is that it is simpler.
>>>>> >>
>>>>> >> http://www.w3.org/Bugs/Public/show_bug.cgi?id=10052
>>>>> >>
>>>>> >> Can you elaborate on what the advantages are of catalog objects?
>>>>> >
>>>>> > To begin with, 10052 shuts down the "users" of the database completely
>>>>> > when
>>>>> > only one is changing its structure, i.e., adding or removing an object
>>>>> > store.
>>>>>
>>>>> This is not the case. Check the steps defined for setVersion in [1].
>>>>> At no point are databases shut down automatically. Only once all
>>>>> existing database connections are manually closed, either by calls to
>>>>> IDBDatabase.close() or by the user leaving the page, is the 'success'
>>>>> event from setVersion fired.
>>>>>
>>>>> [1] http://www.w3.org/Bugs/Public/show_bug.cgi?id=10052#c0
>>>>>
>>>>> > How can we make it less draconian?
>>>>>
>>>>> The 'versionchange' event allows pages that are currently using the
>>>>> database to handle the change. The page can inspect the new version
>>>>> number supplied by the 'versionchange' event, and if it knows that it
>>>>> is compatible with a given upgrade, all it needs to do is to call
>>>>> db.close() and then immediately reopen the database using
>>>>> indexedDB.open(). The open call won't complete until the upgrade is
>>>>> finished.
>>>>>
>>>>
>>>> I had a question here: why does the page need to call 'close'? Any pending
>>>> transactions will run to completion and new ones should not be allowed to
>>>> start if a VERSION_CHANGE transaction is waiting to start. From the
>>>> description of what 'close' does in 10052, I am not entirely sure it is
>>>> needed.
>>>
>>> The problem we're trying to solve is this:
>>>
>>> Imagine an editor which stores documents in indexedDB. However in
>>> order to not overwrite the document using temporary changes, it only
>>> saves data when the user explicitly requests it, for example by
>>> pressing a 'save' button.
>>>
>>> This means that there can be a bunch of potentially important data
>>> living outside of indexedDB, in other parts of the application, such
>>> as in textfields and javascript variables.
>>>
>>> If we were to automatically close all other open IDBDatabase objects
>>> when IDBDatabase.setVersion is closed, that would mean that the
>>> application by default risks losing data. Without a 'versionchange'
>>> event, there is little the application can do to ensure that
>>> IDBDatabase object is closed under it, preventing it from saving data
>>> that lived outside the database. So while run-to-completion and
>>> allowing existing transactions to finish ensures that the database
>>> stays in a consistent state, it does not prevent dataloss on an
>>> application level.
>>>
>>> But even with the 'versionchange' event, we're only solving part of
>>> the problem. First of all it requires that people listen to it and act
>>> appropriately by saving information in indexedDB. Second, it requires
>>> that the application is able to synchronously store all the data in
>>> indexedDB before returning from the 'versionchange' event handler.
>>>
>>> By instead not closing databases automatically, we do the safe thing
>>> by default. And by adding the IDBDatabase.close() function, we allow
>>> pages to asynchronously interact with the user to ask the user if he
>>> wants to save the data. Or to perform other asynchronous actions as
>>> part of saving the data. Once all data has been saved, the application
>>> can call IDBDatabase.close().
>>>
>>> Alternatively, and by default, the setVersion callback simply won't
>>> fire until all other IDBDatabase connections are closed by the user
>>> closing other tabs.
>>>
>>
>> Ah, I see now, thanks for the explanation.
>>
>> My thinking was that the "VERSION_CHANGE" transaction would simply
>> kick in like any other transaction. However, you're right: that would
>> not work if the application would not be able to save its state
>> immediately in the "versionchange" handler. By the time the user
>> confirms the changes, the VERSION_CHANGE transaction could have kicked
>> in and changed the schema.
>
> Exactly.
>
>>>>> > Secondly, I don't see how that
>>>>> > approach can produce atomic changes to the database.
>>>>>
>>>>> When the transaction created in step 4 of setVersion defined in [1] is
>>>>> created, only one IDBDatabase object to the database is open. As long
>>>>> as that transaction is running, no requests returned from
>>>>> IDBFactory.open will receive a 'success' event. Only once the
>>>>> transaction is committed, or aborted, will those requests succeed.
>>>>> This guarantees that no other IDBDatabase object can see a partial
>>>>> update.
>>>>>
>>>>> Further, only once the transaction created by setVersion is committed,
>>>>> are the requested objectStores and indexes created/removed. This
>>>>> guarantees that the database is never left with a partial update.
>>>>>
>>>>> That means that the changes are atomic, right?
>>>>>
>>>>
>>>> Sounds atomic IMHO.
>>>
>>> One thing that I forgot to mention. If the setVersion transaction is
>>> aborted for some reason, for example due to disk errors or due to
>>> explicit calls to .abort(), all changes are rolled back, including all
>>> calls to createObjectStore/createIndex/removeObjectStore/removeIndex.
>>>
>>>>> > Thirdly, we shouldn't
>>>>> > need to change version in order to perform database changes.
>>>>>
>>>>> First off, note that if the upgrade is compatible, you can just pass
>>>>> the existing database version to setVersion. So no version *change* is
>>>>> actually needed.
>>>>>
>>>>> Second, I don't think there is much difference between
>>>>>
>>>>> var txn = db.transaction();
>>>>> db.openCatalog(txn).onsuccess = ...
>>>>>
>>>>> vs
>>>>>
>>>>> db.setVersion("5").onsuccess = ...
>>>>>
>>>>> I don't see that telling people that they have to use the former is a big
>>>>> win.
>>>>>
>>>>>
>>>>> The problem that I see with the catalog proposal, if I understand it
>>>>> correctly, is that it means that a page that has a IDBDatabase object
>>>>> open has to always be prepared for calls to
>>>>> openObjectStore/openTransaction failing. I.e. the page can't ever know
>>>>> that another page was opened which at any point created a catalog and
>>>>> removed an objectStore. This forces pages to at every single call
>>>>> either check that the version is still the same, or that each and
>>>>> every call to openObjectStore/openTransaction succeeds. This seems
>>>>> very error prone to me.
>>>>>
>>>>
>>>> I guess you could pass an event ("schemachange") as soon as the catalog
>>>> transaction is committed? This somewhat similar to the "versionchange" event
>>>> fired by setVersion(). At the same time, if setVersion() can be used to
>>>> provide atomic schema changes, I don't see what the catalog buys us.
>>>
>>> Agreed. And even a "schemachange" event requires that people listen to
>>> it. I.e. it's unsafe by default, and only becomes safe once pages take
>>> explicit action.
>>>
>>>>> Looking at your example, it also seems like it contains a race
>>>>> condition. There is a risk that when someone opens a database, the
>>>>> first transaction, which uses a catalog to create the necessary
>>>>> objectStores and indexes, is committed, but the second transaction,
>>>>> which populates the objectStores with data, has not yet started.
>>>>>
>>>>> > Finally, I am
>>>>> > not sure why you consider the syntax proposal simpler. Note that I am
>>>>> > not
>>>>> > averse to the version change event notification.
>>>>>
>>>>> Compare to how your code would look like with the proposals in bugs
>>>>> 9975 and 10052:
>>>>>
>>>>> var db;
>>>>> var dbRequest = indexedDB.open("parts", 'Part database');
>>>>> dbRequest.onsuccess = function(event) {
>>>>>  db = event.result;
>>>>>  if (db.version != "1") {
>>>>>    versionRequest = db.setVersion("1");
>>>>>    versionRequest.ontimeout = function(event) {
>>>>>      throw new Error("timeout while waiting for other DBs to close");
>>>>>    };
>>>>>    versionRequest.onsuccess = function(event) {
>>>>>      event.transaction.onerror = function(event) {
>>>>>        throw new Error("Failed to set up database due to error: " +
>>>>> event.message);
>>>>>      };
>>>>>      db.setVersion("1");
>>>>
>>>>
>>>> Do you need to call setVersion again?
>>>
>>> Oops, my bad, please remove that line.
>>>
>>>>> >> One of our main points was to make getting objectStore
>>>>> >> objects a synchronous operation as to avoid having to nest multiple
>>>>> >> levels of asynchronous calls. Compare
>>>>> >>
>>>>> >> var req = db.openObjectStore("foo", trans);
>>>>> >> req.onerror = errorHandler;
>>>>> >> req.onsuccess = function(e) {
>>>>> >>  var fooStore = e.result;
>>>>> >>  var req = fooStore.get(12);
>>>>> >>  req.onerror = errorHandler;
>>>>> >>  req.onsuccess = resultHandler;
>>>>> >> }
>>>>> >>
>>>>> >> to
>>>>> >>
>>>>> >> var fooStore = db.openObjectStore("foo", trans);
>>>>> >> var req = fooStore.get(12);
>>>>> >> req.onerror = errorHandler;
>>>>> >> req.onsuccess = resultHandler;
>>>>> >>
>>>>> >>
>>>>> >> I also don't understand the advantage of having the transaction as an
>>>>> >> argument to openObjectStore rather than having openObjectStore live on
>>>>> >> transaction. Compare
>>>>> >>
>>>>> >> db.openObjectStore("foo", trans);
>>>>> >>
>>>>> >> to
>>>>> >>
>>>>> >> trans.openObjectStore("foo");
>>>>> >>
>>>>> >> I also don't understand the meaning of specifying a mode when a
>>>>> >> objectStore is opened, rather than specifying the mode when the
>>>>> >> transaction is created.
>>>>> >
>>>>> > Have you reviewed the examples? Different object stores in a transaction
>>>>> > are
>>>>> > used in different modes, and that requires us to identify the mode when
>>>>> > opening the object store. This also increases concurrency. This is
>>>>> > particularly useful for dynamic transactions.
>>>>>
>>>>> I'm following you better now. I do see how this can work for dynamic
>>>>> transactions where locks are not acquired upon creation of the
>>>>> transaction. But I don't see how this makes sense for static
>>>>> transactions. And it indeed seems like you are not using this feature
>>>>> for static transactions.
>>>>>
>>>>
>>>> I don't think it's even possible with the current API since
>>>> openTransaction() takes a list of objectStore names but a single mode.
>>>
>>> Indeed. We could allow static transactions to use different lock
>>> levels for different objectStores, all specified when the
>>> IDBTransaction is initially created. It's just a matter of syntax to
>>> the IDBDatabase.transaction() function. However so far I've thought
>>> that we should leave this for v2. But if people would feel more easy
>>> about dropping dynamic transactions if we add this ability, then I
>>> would be ok with it.
>>>
>>>>> If it is the case that specifying a mode when opening an objectStore
>>>>> only makes sense on dynamic transactions, then I think we should only
>>>>> expose that argument on dynamic transactions.
>>>>>
>>>>> Now that I understand your proposal better, I don't understand how
>>>>> IDBTransaction.objectStore works for dynamically scoped transactions
>>>>> in your proposal. It seems to require synchronously grabbing a lock
>>>>> which I thought we wanted to avoid at all cost.
>>>>>
>>>>
>>>> This is rather confusing: is IDBTransaction::objectStore() creating an
>>>> object store, now? If yes, then it must lock it synchronously. If it just
>>>> returns an object store that was previously added to the transaction, what
>>>> is the 'mode' parameter for?
>>>
>>> Looking at Nikunj's example code, it seems like you can request a new
>>> objectStore to be locked using IDBTransaction.objectStore(). Not sure
>>> if that is a bug or not though?
>>>
>>
>> Not sure either. I see the following examples:
>>
>> 1)
>> var txnRequest = db.openTransaction(); //requesting to lock the entire
>> database exclusively
>> txnRequest.onsuccess = function(event) {
>> var txn = txnRequest.transaction;
>> var part = txn.objectStore("part", IDBObjectStore.READ_WRITE);
>>
>> So by the time txnRequest.onsuccess runs, the entire database is
>> exclusively locked. I don't then follow why we pass READ_WRITE to
>> txn.objectStore()...we already have a lock on that store.
>>
>> 2)
>> var txn = db.transaction(); //synchronous because requesting locks as I go along
>> var parts = txn.objectStore("part", IDBObjectStore.READ_ONLY);
>>
>> 'parts' is an IDBObjectStore, not an IDBRequest. So is 'parts' locked
>> at this point? If it is, then objectStore must've synchronously
>> acquired that lock, as you said. If it isn't, then the following call
>> to 'parts.get()' must first acquire the lock. So it's either
>> unreasonable or confusing.
>>
>> Anyway, I see the point the example is trying to make (i.e. using
>> object stores in different modes inside the same transaction). I think
>> we can live (for v1) with solving such examples with a READ_WRITE
>> static transaction and potentially losing some concurrency.
>
> I think that is ok too. Though like I said, if it allows makes people
> more comfortable with dropping dynamic transactions, then I'm fine
> with allowing more granular control for static transactions.
>
>>>>> >> Unless we're planning on making all
>>>>> >> transactions dynamic (I hope not), locks have to be grabbed when the
>>>>> >> transaction is created, right? If a transaction is holding a READ_ONLY
>>>>> >> lock for a given objectStore, then attempting to open that objectStore
>>>>> >> as READ_WRITE should obviously fail. Consecutively, if a transaction
>>>>> >> is holding a READ_WRITE lock for a given objectStore, then opening
>>>>> >> that objectStore as READ_ONLY doesn't seem to have any benefit over
>>>>> >> opening it as READ_WRITE. In short, I can't see any situation when
>>>>> >> you'd want to open an objectStore in a different mode than what was
>>>>> >> used when the transaction was created.
>>>>> >>
>>>>> >> Finally, I would stronly prefer to have READ_ONLY be the default
>>>>> >> transaction type if none is specified by the author. It is better to
>>>>> >> default to what results in higher performance, and have people notice
>>>>> >> when they need to switch to the slower mode. This is because people
>>>>> >> will very quickly notice if they use READ_ONLY when they need to use
>>>>> >> READ_WRITE, since the code will never work. However if people use
>>>>> >> READ_WRITE when all they need is READ_ONLY, then the only effect is
>>>>> >> likely to be an application that runs somewhat slower, when they will
>>>>> >> unlikely detect.
>>>>> >
>>>>> > This approach is also likely to cause exceptions upon put, remove, and
>>>>> > add.
>>>>> > I would prefer to not cause exceptions as the default behavior.
>>>>>
>>>>> If we use READ_WRITE as default behavior then it's extremely likely
>>>>> that people will use the wrong lock type and not realize. The downside
>>>>> will be that sites will run less concurrently, and thus slower, than
>>>>> they could. Another downside is that authors should specify lock-type
>>>>> more often, for optimal performance, if we think that READ_ONLY is
>>>>> more common.
>>>>>
>>>>> If we are using READ_ONLY as default behavior, then it's extremely
>>>>> likely that people will use the wrong lock type, notice that their
>>>>> code isn't working, and fix it. The downside is that people will have
>>>>> to fix their bugs. Another downside is that authors will have to
>>>>> specify lock-type more often if we think that READ_WRITE is more
>>>>> common.
>>>>>
>>>>> To me the downsides of using READ_WRITE as a default are much worse
>>>>> than the downsides of using READ_ONLY.
>>>
>>> Andrei: would be great to hear your opinion on this.
>>>
>>
>> It seems very wasteful to use READ_WRITE as the default for the
>> reasons you mentioned. The downside of having write operations throw
>> with the default transaction mode can easily be overcome if the
>> exception message is clear and tells people what they have to do. On
>> the other hand, tuning application performance by scanning your code
>> to see if you could turn some transactions to READ_ONLY seems rather
>> painful.
>
> My feeling exactly.
>
> There seems to be much agreement on the proposals in bug 9975 and
> 10052. Most of the disagreements seems to be stemming from
> misunderstandings how the proposal works, such as weather atomic
> schema transactions are possible, and if the synchronous transaction
> API requires synchronously grabbing locks.
>
> I propose that we go ahead and implement in the specification the
> changes proposed in bug 9975 and 10052 and use that as basis for
> further discussions. We've already found a few issues in those
> proposals, such as the lack of support for dynamic transactions (if we
> want to keep those), but I think it will be easier to discuss the
> changes that are needed in terms of changes needed, rather than the
> significantly different API that is in the draft now.
>

Agreed. I think your proposal in 9975 should be the basis for the v1
of this spec.

> If this sounds ok to everyone, I can write these changes up monday.
>

Sounds ok, I will try to help you with this.

Thanks,
Andrei
Received on Friday, 9 July 2010 10:22:06 UTC