Re: [WebStorage] Concerns on spec section 'Processing Model' from Aaron Boodman on 2009-07-24 (public-webapps@w3.org from July to September 2009)

From: Aaron Boodman <aa@google.com>
Date: Fri, 24 Jul 2009 15:57:14 -0700
To: "Nikunj R. Mehta" <nikunj.mehta@oracle.com>
Cc: Ian Hickson <ian@hixie.ch>, public-webapps WG <public-webapps@w3.org>, Laxmi Narsimha Rao Oruganti <Laxmi.Oruganti@microsoft.com>
Message-ID: <278fd46c0907241557j2cf86f21s8271fcf85c528480@mail.gmail.com>
On Fri, Jul 24, 2009 at 2:54 PM, Nikunj R. Mehta<nikunj.mehta@oracle.com> wrote:
>
> On Jul 24, 2009, at 2:06 PM, Aaron Boodman wrote:
>
>> On Fri, Jul 24, 2009 at 1:54 PM, Nikunj R. Mehta<nikunj.mehta@oracle.com>
>> wrote:
>>>
>>> Experience has shown that there is no easy way out when dealing with
>>> transactions, and locking at the whole database level is no solution to
>>> failures.
>>
>> The thing that makes the web browser environment different an
>> interesting is that multiple independent applications can end up
>> having access to the same database if the run in the same origin.
>
> Applications have the ability to specify which database they want to use. So
> I don't see problems in apps sharing an origin.

Right, but say two Gmail tabs are opened independently. They both say
they want to access the "messages" database. They have no way to know
about each other except through shared storage (postMessage does not
work across multiple independent tabs).

Now they can conflict with each other. There is no way for the
developer to deal with this problem other than retrying or
implementing another concurrency system on top of shared storage. This
seems bad.

>> This could be multiple instances of the same app (eg multiple gmail
>> windows) or just different apps that happen to be on the same origin
>> (many Google apps run on www.google.com).
>
> When running multiple instances of the same application, or when different
> applications share the same data, you are beginning to deal with multi-user
> applications (even though it may be the same security principal). In
> multi-user applications, database transactions are the same as what they are
> on the server. Applications have no choice but to be careful in performing
> transactions. Let me illustrate this with an example.
>
> Say that I had a spreadsheet app. The value of a cell was displayed to the
> user as X. Now, I go in to one tab A and say "add five to X". I also go in
> to B and say "add five to X". One of those operations will have to fail
> because it finds that the version of X is not what it was when the
> transaction started out. Even if you put a lock on the entire database, you
> can't avoid that problem.

The issue of the data changing between the time when it was displayed
to the user and the time when an update is started is different than
the problem of the data changing while a multi-step update (a
transaction) is in progress.

The first problem is well known and understood by client-side web
developers because the web is stateless and the same can occur between
contacts with the server. It's also pretty self-evident that if you
copy data out of storage and into the UI that the two can change
independently.

The second problem is not well known by the same people and would be
surprising. Up until recently there was no local storage except
cookies and some proprietary things, and both were synchronous.
Because all browsers until recently were single-threaded, this
effectively meant that clients had storage-wide locks (there were
actually bugs with this in Firefox+cookies, I am told, but cookies
were not frequently used in a way that exposed it).

> It seems that the way the spec is written, novice programmers would be led
> to either
>
> 1. face lost updates because they assume the browser locks the entire
> database, and so they won't bother to do their own analysis of whether data
> has changed since the last time they saw it.

Some very novice users will not realize that their UI and local store
can change independently, or will, but won't realize that there can be
multiple copies of their apps.

I think the current design is a good trade-off because addressing that
problem would essentially mean binding the UI to the datastore, which
would introduce gigantic API complexity making it basically not
workable. And many developers will understand the problem from
experience with the web.

> 2. create single-instance-only apps , i.e., hold a write lock on the
> database forever since they don't want to deal version checks.

I don't think you understand the spec - it isn't actually possible to
hold the lock forever. Locks aren't an explit part of the API, but are
implicit and released automatically when functions return.

Take a look at the transaction method again:

db.transaction(function() {
  tx.executeSql(strSql, function() {

  });
});

The transaction is implicitly released when the last sql statement is
completed (or fails). The only way you can keep this transaction open
is to execute more SQL.

>> Because these apps are isolated from each other, they have no way to
>> cooperate to reduce conflicts. They also have no control over whether
>> there are multiple copies of themselves (the user control this).
>
> Sorry, but there is postMessage, localStorage, and the database itself. What
> do you mean these apps are isolated and have no way to cooperate?

postMessage can't be used across independent tabs. Even if it could,
it is asynchronous and most vendors would be reluctant to put more in
the spec that guarantees tabs are on the same thread. So you are
reduced to very awkward ways of cooperating -- using the database
itself as a queue or for master election, or designing a separate
transaction system between tabs which might be on separate threads,
using an asynchronous API. Or you just accept that any statement can
fail and retry everything. Or your app is just buggy if multiple
instances are open.

>> Therefore if the platform does not protect against this, basically any
>> statement can fail due to conflict. This was a big problem with Gears,
>> and led to applications having to go to crazy contortions to do things
>> like master election.
>
> I will assert that the platform can't solve all transaction failures by
> changing the granularity of concurrency control. Otherwise, we wouldn't have
> database designers going to extremes with serializability protocols.

Again, I think concurrency of logical operations between the UI and
the datastore is a different problem than concurrency of actual
transactions against the data store. Nothing can realistically be done
about the former, and that's OK be it is well-known. We can do
something about the latter, and you haven't proposed any reason why we
shouldn't.

>> When we designed the HTML5 version of the database API we specifically
>> tried to avoid it.
>>
>
> I am perplexed that you expect a poor programmer to understand transactions
> but not understand recovery.

You don't need to understand transactions deeply to use the proposed
API, you just need to call the methods. It is not possible to use it
in a way that will lead to conflicts. That is the point.

>> I do not agree that database-level locking is a big problem for web
>> applications.
>
> Our problem is not with databases doing database-level locking. Our problem
> is that such behavior is a MUST.

I think it is very desirable for it to appear to the developer that
writes to the local datastore are atomic. Lots of complexity falls out
if this is not true. In some models (non-SQL) it may be easier to
arrange a large update in the application layer and commit it all at
once. In SQL, this is less true so it is important to provide API that
makes conflicts impossible while a multi-step update is in progress.

Perhaps your real issue is that the current API does not work well for
non SQL data stores. That is likley very true, but it wasn't designed
for that use case. I would argue that the existing key/value API that
is also in HTML5 may be a better fit for those models.

- a
Received on Friday, 24 July 2009 22:57:54 UTC