Re: [IndexedDB] Current editor's draft from Jonas Sicking on 2010-07-10 (public-webapps@w3.org from July to September 2010)

From: Jonas Sicking <jonas@sicking.cc>
Date: Fri, 9 Jul 2010 18:14:15 -0700
To: Nikunj Mehta <nikunj@o-micron.com>
Cc: Andrei Popescu <andreip@google.com>, public-webapps <public-webapps@w3.org>
Message-ID: <AANLkTim1gD3slNvUsnZX1vGOrv5EIvDQkJ9Vc8k8ZWG0@mail.gmail.com>
On Fri, Jul 9, 2010 at 11:44 AM, Nikunj Mehta <nikunj@o-micron.com> wrote:
>
> On Jul 8, 2010, at 12:38 AM, Jonas Sicking wrote:
>
>> On Wed, Jul 7, 2010 at 10:41 AM, Andrei Popescu <andreip@google.com> wrote:
>>>
>>>
>>> On Wed, Jul 7, 2010 at 8:27 AM, Jonas Sicking <jonas@sicking.cc> wrote:
>>>>
>>>> On Tue, Jul 6, 2010 at 6:31 PM, Nikunj Mehta <nikunj@o-micron.com> wrote:
>>>>> On Wed, Jul 7, 2010 at 5:57 AM, Jonas Sicking <jonas@sicking.cc> wrote:
>>>>>>
>>>>>> On Tue, Jul 6, 2010 at 9:36 AM, Nikunj Mehta <nikunj@o-micron.com>
>>>>>> wrote:
>>>>>>> Hi folks,
>>>>>>>
>>>>>>> There are several unimplemented proposals on strengthening and
>>>>>>> expanding IndexedDB. The reason I have not implemented them yet is
>>>>>>> because I am not convinced they are necessary in toto. Here's my
>>>>>>> attempt at explaining why. I apologize in advance for not responding
>>>>>>> to individual proposals due to personal time constraints. I will
>>>>>>> however respond in detail on individual bug reports, e.g., as I did
>>>>>>> with 9975.
>>>>>>>
>>>>>>> I used the current editor's draft asynchronous API to understand
>>>>>>> where
>>>>>>> some of the remaining programming difficulties remain. Based on this
>>>>>>> attempt, I find several areas to strengthen, the most prominent of
>>>>>>> which is how we use transactions. Another is to add the concept of a
>>>>>>> catalog as a special kind of object store.
>>>>>>
>>>>>> Hi Nikunj,
>>>>>>
>>>>>> Thanks for replying! I'm very interested in getting this stuff sorted
>>>>>> out pretty quickly as almost all other proposals in one way or another
>>>>>> are affected by how this stuff develops.
>>>>>>
>>>>>>> Here are the main areas I propose to address in the editor's spec:
>>>>>>>
>>>>>>> 1. It is time to separate the dynamic and static scope transaction
>>>>>>> creation so that they are asynchronous and synchronous respectively.
>>>>>>
>>>>>> I don't really understand what this means. What are dynamic and static
>>>>>> scope transaction creation? Can you elaborate?
>>>>>
>>>>> This is the difference in the API in my email between openTransaction
>>>>> and
>>>>> transaction. Dynamic and static scope have been defined in the spec for
>>>>> a
>>>>> long time.
>>>>
>>>
>>> In fact, dynamic transactions aren't explicitly specified anywhere. They are
>>> just mentioned. You need some amount of guessing to find out what they are
>>> or how to create one (i.e. pass an empty list of store names).
>>
>> Yes, that has been a big problem for us too.
>>
>>>> Ah, I think I'm following you now. I'm actually not sure that we
>>>> should have dynamic scope at all in the spec, I know Jeremy has
>>>> expressed similar concerns. However if we are going to have dynamic
>>>> scope, I agree it is a good idea to have separate APIs for starting
>>>> dynamic-scope transactions from static-scope transactions.
>>>>
>>>
>>> I think it would simplify matters a lot if we were to drop dynamic
>>> transactions altogether. And if we do that,  then we can also safely move
>>> the 'mode' to parameter to the Transaction interface, since all the object
>>> stores in a static transaction can be only be open in the same mode.
>>
>> Agreed.
>>
>>>>>>> 2. Provide a catalog object that can be used to atomically add/remove
>>>>>>> object stores and indexes as well as modify version.
>>>>>>
>>>>>> It seems to me that a catalog object doesn't really provide any
>>>>>> functionality over the proposal in bug 10052? The advantage that I see
>>>>>> with the syntax proposal in bug 10052 is that it is simpler.
>>>>>>
>>>>>> http://www.w3.org/Bugs/Public/show_bug.cgi?id=10052
>>>>>>
>>>>>> Can you elaborate on what the advantages are of catalog objects?
>>>>>
>>>>> To begin with, 10052 shuts down the "users" of the database completely
>>>>> when
>>>>> only one is changing its structure, i.e., adding or removing an object
>>>>> store.
>>>>
>>>> This is not the case. Check the steps defined for setVersion in [1].
>>>> At no point are databases shut down automatically. Only once all
>>>> existing database connections are manually closed, either by calls to
>>>> IDBDatabase.close() or by the user leaving the page, is the 'success'
>>>> event from setVersion fired.
>>>>
>>>> [1] http://www.w3.org/Bugs/Public/show_bug.cgi?id=10052#c0
>>>>
>>>>> How can we make it less draconian?
>>>>
>>>> The 'versionchange' event allows pages that are currently using the
>>>> database to handle the change. The page can inspect the new version
>>>> number supplied by the 'versionchange' event, and if it knows that it
>>>> is compatible with a given upgrade, all it needs to do is to call
>>>> db.close() and then immediately reopen the database using
>>>> indexedDB.open(). The open call won't complete until the upgrade is
>>>> finished.
>>>>
>>>
>>> I had a question here: why does the page need to call 'close'? Any pending
>>> transactions will run to completion and new ones should not be allowed to
>>> start if a VERSION_CHANGE transaction is waiting to start. From the
>>> description of what 'close' does in 10052, I am not entirely sure it is
>>> needed.
>>
>> The problem we're trying to solve is this:
>>
>> Imagine an editor which stores documents in indexedDB. However in
>> order to not overwrite the document using temporary changes, it only
>> saves data when the user explicitly requests it, for example by
>> pressing a 'save' button.
>>
>> This means that there can be a bunch of potentially important data
>> living outside of indexedDB, in other parts of the application, such
>> as in textfields and javascript variables.
>>
>> If we were to automatically close all other open IDBDatabase objects
>> when IDBDatabase.setVersion is closed, that would mean that the
>> application by default risks losing data. Without a 'versionchange'
>> event, there is little the application can do to ensure that
>> IDBDatabase object is closed under it, preventing it from saving data
>> that lived outside the database. So while run-to-completion and
>> allowing existing transactions to finish ensures that the database
>> stays in a consistent state, it does not prevent dataloss on an
>> application level.
>>
>> But even with the 'versionchange' event, we're only solving part of
>> the problem. First of all it requires that people listen to it and act
>> appropriately by saving information in indexedDB. Second, it requires
>> that the application is able to synchronously store all the data in
>> indexedDB before returning from the 'versionchange' event handler.
>>
>> By instead not closing databases automatically, we do the safe thing
>> by default. And by adding the IDBDatabase.close() function, we allow
>> pages to asynchronously interact with the user to ask the user if he
>> wants to save the data. Or to perform other asynchronous actions as
>> part of saving the data. Once all data has been saved, the application
>> can call IDBDatabase.close().
>>
>> Alternatively, and by default, the setVersion callback simply won't
>> fire until all other IDBDatabase connections are closed by the user
>> closing other tabs.
>
> Would every page need to understand the versionchange protocol in addition to understanding transactions, so as not to starve itself? IOW, every page has to be prepared to listen for versionchange event in order to allow some page to upgrade the database. In the event there is a non-cooperating page, upgrades would fail. It seems like the only safe thing a page can do to respond to versionchange event is cease whatever it is doing, save the application state, and wait to be notified to continue. In short an approach that requires far greater coordination than we have evidence to suggest would be acceptable to programmers of different levels of sophistication.

This is mostly correct yes. Upgrades can also happen once the user
close other tabs running the application. But note comments below.

> It feels much "simpler" to just have another kind of object, e.g., catalog, and the same concepts of transactions to avoid stepping on each other's toes. I have already explained why adding an object store or index should not require anyone to discontinue using the database. Removing an object store or index only needs one to wait to stop using the affected object store or index.
>
> Moreover, the catalog object proposal is more concurrent, if that brings more attractiveness to the approach.

The catalog proposal introduces a IMHO far more dangerous protocol.
Applications have to understand that at any point, when they don't
have transactions open to a given objectStore, any the objectStore can
go away.

Worse, there is no recourse for the application as if a objectStore
goes away, any data kept in the application, such as a edited blog
post, can't be saved and so is lost as soon as the application is
closed. I.e. there is no way for an application to protect itself from
incompatible schema changes.

Even worse yet, there way for an application to safely do a version
upgrade that involves a schema incompatible change. It will always run
the risk there are other pages open and that they are holding data
which won't saved in the database and thus result in dataloss.
Additionally, those pages might be making modifications to remaining
objectStores, unaware that there has been a change in schema. This
would cause not just dataloss, but also data corruption.

So in short, these are the things that I think are worse with the
catalog proposal:

* There is no way for an application to protect itself from
incompatible version changes. This can lead to dataloss if the
application is holding data temporarily outside of the database and
the objectStore that the application intended to use is removed. For
example while the user is editing a blog post that is on request saved
in the database.
* Likewise, there is no way for a page wanting to upgrade the schema
to do so in a way that doesn't risk causing dataloss in other pages
that are currently using the database and is holding data outside the
database.
* Applications are forced to at the start of *every* transaction check
that a incompatible schema change hasn't happened since last
transaction was finished.
* If an application forgets to check the version at the start of any
writing transaction this can result in data corruption due to using an
objectStore with a changed schema.
* Uses asynchronous functions for creating and removing objectStores
and indexes.

On the flip side, the setVersion proposal has the following downsides:

* Requires that pages use the 'versionchange' event if it wants to
allow existing pages to remain open while a new page upgrades the
database schema.

IMHO the setVersion proposal clearly has fewer, and less critical, downsides.

/ Jonas
Received on Saturday, 10 July 2010 01:15:08 UTC