Re: [WebStorage] Concerns on spec section 'Processing Model' from Ian Hickson on 2009-07-29 (public-webapps@w3.org from July to September 2009)

From: Ian Hickson <ian@hixie.ch>
Date: Wed, 29 Jul 2009 19:53:51 +0000 (UTC)
To: Laxmi Narsimha Rao Oruganti <Laxmi.Oruganti@microsoft.com>, "Nikunj R. Mehta" <nikunj.mehta@oracle.com>, Aaron Boodman <aa@google.com>
Cc: "public-webapps@w3.org" <public-webapps@w3.org>
Message-ID: <Pine.LNX.4.62.0907291903150.3189@hixie.dreamhostps.com>
On Thu, 16 Jul 2009, Laxmi Narsimha Rao Oruganti wrote:
>
> - Why is the spec mandating a transaction to take an *exclusive write 
> lock on the entire database*?

It is to avoid the possibility that two transactions will conflict and 
then have one be forced to roll back.

For example, we want to make sure that there is no way that a Web page 
will ever have a failure if it opens a transaction that does:

   read from row A.
   read from row B.
   read from row C.
   write to row A.
   write to row B.
   write to row C.

...even if the page is opened twice, and the two pages both do this 
transaction at the same time.

Since client-side have very low contention, getting a lock on the entire 
database rather than requiring that the author explicitly lock specific 
rows or columns is good enough.


On Fri, 24 Jul 2009, Nikunj R. Mehta wrote:
>
> If you want to provide an application programmer with a limited degree 
> of freedom from a certain class of errors, then there is a different 
> solution. It is called isolation level [1]. When opening a transaction, 
> just provide the required isolation level. Heck, if you'd like, make 
> "SERIALIZABLE" the default value.

This doesn't prevent rollback in the above case as far as I can tell.


> But don't disallow other possibilities or create the illusion of silver 
> bullets.

Why not?


> > The exclusive lock model described in the spec is just a model, it 
> > isn't intended to be actually require an exclusive lock. If an 
> > implementation can get the same result using some other mechanism, 
> > that's fine.
> 
> The spec says:
> [[
> If the mode is read/write, the transaction must have an exclusive write lock
> over the entire database
> ]]
> 
> Therefore, correct me if I am wrong, but the spec prohibits the following:
> 
> An implementation of the Database object allows more than one 
> transaction to write in a database while another transaction has a write 
> lock on the same database, it is a failure.

Assuming you mean that you would cause one of the two transactions above 
to fail, then this is not a black-box equivalent implementation of what 
the spec says, and it is therefore not conforming.


> If so, then I want to formally object to that spec text because it is overly
> restrictive on implementers as well as on application programmers.

Do you have any alternative proposal that doesn't risk either transaction 
failing in the above example?


> > > 3. A read-only transaction includes inside it a read-write 
> > > transaction.
> > 
> > This isn't possible with the current asynchronous API as far as I can 
> > tell. With the synchronous API, it would hang trying to open the 
> > read-write transaction for however long it takes the UA to realise 
> > that the script that is trying to get the read-write transaction is 
> > the same one as the one that has an open read-only transaction, and 
> > then it would fail with error code 7.
> 
> Then again the spec is too restrictive because application programmers 
> need the ability to upgrade their lock from read-only to read-write

What are the use cases for this?


> and an application should never deadlock itself.

I would expect implementations to catch this case, but even if they 
didn't, it would time out eventually anyway. I can add explicit text 
saying that this should check to see if there is a synchronous read/write 
transaction open in the same thread; would that help?


> Therefore, I formally object to the spec disallowing an application to 
> upgrade its database lock.

We're not disallowing it, we're just not providing that feature in this 
version of the API. We have to start small, otherwise implementations 
won't be able to catch up with the spec.

Even in future versions, though, without good use cases I don't see why we 
would provide this. Databases on the client side are used in quite 
different ways than databases on the server side.


> > > Experience has shown that there is no easy way out when dealing with 
> > > transactions, and locking at the whole database level is no solution 
> > > to failures.
> > 
> > It's not supposed to be a solution to failures, it's supposed to be, 
> > and is, as far as I can tell, a way to make unpredictable, transient, 
> > intermittent, and hard-to-debug concurrency errors into guaranteed, 
> > easy-to-debug errors.
> 
> How is a timeout an easy-to-debug error?

It always happens. You walk through with the debugger, and every time, at 
the same place, you get the same timeout.


> What is the meaning of a guaranteed error?

Nor intermittent. Not transient. Not "Heisenberg" bugs.


> How is a guaranteed error better than its opposite?

It's far easier to debug.


> Do you have any facts to back this up?

Back what up? That intermittent bugs are harder to debug than reliably 
reproduceable bugs?


> > > > I think this is an important invariant, because otherwise script 
> > > > writers _will_ shoot themselves in the foot.
> > > 
> > > Even if the transaction lock doesn't fail, how would one deal with 
> > > other transaction failures?
> > 
> > I don't understand the relevance. If there's a hardware error, 
> > retrying isn't going to help. If there's a concurrency error, the only 
> > solution will be to design complex locking semantics outside the API, 
> > which would be a terrible burden to place on Web authors.
> 
> As I explained in my simple example of updating a spreadsheet cell, 
> users cannot avoid complex semantics when dealing with concurrency and 
> sharing in the face of consistency needs. It is an end-to-end 
> reliability requirement (in the same sense as that used by Saltzer, Reed 
> and Clark), and unavoidable for all but the unreliable systems.

No, in this case we really can avoid concurrency bugs, just by locking the 
entire database each time you do a write.


> > > > These aren't professional database developers; Web authors span 
> > > > the gamut of developer experience from the novice who is writing 
> > > > code more by luck than by knowledge all the way to the UI designer 
> > > > who wound up stuck with the task for writing the UI logic but has 
> > > > no professional background in programing, let alone concurrency in 
> > > > databases.
> > > 
> > > This is a strong reason to avoid SQL in the front-end.
> > 
> > I understand that SQL is not a popular solution for everyone, yes. 
> > Hopefully other solutions will be proposed (so far none have been 
> > proposed that are serious contenders.)
> 
> I beg to differ other solutions have been proposed and, one (B-tree 
> APIs), supported by a number of the members of this WG. It has not been 
> put up in a W3C document, if you mean that by a proposal. I have it on 
> my plate and will get to it soon.

B-Trees really aren't going to be able to address the use cases that a 
full SQL system does. For example, it really isn't practical to do complex 
queries or full-text-search using B-Trees.


On Fri, 24 Jul 2009, Nikunj R. Mehta wrote:
> > 
> > Take a look at the transaction method again:
> > 
> > db.transaction(function(tx) {
> >  tx.executeSql(strSql, function() {
> > 
> >  });
> > });
> > 
> > The transaction is implicitly released when the last sql statement is 
> > completed (or fails). The only way you can keep this transaction open 
> > is to execute more SQL.
> 
> If I put in a timer or another asynchronous call inside the block and that
> block used the variable tx, wouldn't it force the implementation to continue
> holding the database lock?

No.


On Fri, 24 Jul 2009, Nikunj R. Mehta wrote:
> 
> A transaction is not complete until I either commit or rollback the 
> transaction, which I can choose to do as late as I want to, e.g., at 
> window.onclose.

No. In the asynchronous API, the transaction closes as soon as you return 
from the last callback without adding a statement.

With the synchronous API, you are correct, of course.


On Fri, 24 Jul 2009, Nikunj R. Mehta wrote:
> 
> The spec is also silent about what happens if I put a wait by making another
> asynchronous call inside my transaction callback logic.

You can't put a wait by doing an asynchronous call. Asynchronous calls 
aren't, well, synchronous.


> The processing model in 4.3.2 simply says that the SQL statements are queued
> up. It is unclear what if anything happens if the database runs out of
> statements to execute if the transaction logic takes time to add another
> statement to the queue before the database decides to commit. Am I wrong or is
> this an ambiguous, but correct interpretation?

It's not possible to add to the queue except from within a callback that 
is blocking the algorithm, so I don't see what you mean. Could you show 
some sample code showing what you mean?


> Those who are worried about throwing complexity of transaction recovery 
> on Web programmers should perhaps also be worried about the insane 
> complexity of asynchronous transaction programming, that no one in the 
> world should have to learn. The mainstream database developers don't 
> have to deal with that. Why should poor Web programmers have to suffer 
> this?

I don't buy that callbacks are that complicated (certainly nowhere near as 
complicated as concurrency problems), but in any case we have the 
synchronous API for Workers if people prefer that model.


> Moreover, with an asynchronous database the spec doesn't allow an 
> application to rollback a transaction

Throwing an exception from a callback will rollback the transaction.



> This is yet another case of creating a storage API that is different 
> from traditional database developers.

We're really not trying to make an API that is familiar to traditional 
database developers. If anything, given how complex such APIs 
traditionally are, that may in fact be an anti-goal.


> There seems to be a pattern of ignoring good API practices when 
> interacting with a database and it appears intentional. Am I wrong in my 
> interpretation?

Certainly we're not trying to ignore good API practices, though we may 
disagree on what is "good".


On Sun, 26 Jul 2009, Laxmi Narsimha Rao Oruganti wrote:
>
> It is lot more easy if the *model* is called out as hints than putting 
> in normal text.  A spec carries a lot of importance in interoperability 
> and every line is expected to be thorough and clear and I am sure you 
> know this fact very well.  Can we convert the "exclusive lock" stuff as 
> hint and put the requirement as "Database system should allow only one 
> writer transaction per a given database".

It is unfortunately much more difficult to reason about such "models" than 
about a set of steps that overtly state what should happen, IMHO. The end 
result is the same, though -- you don't have to implement what it says, so 
long as what you implement cannot be distinguished from what it says. You 
don't have to have an exclusive lock so long as it's not possible for the 
two transactions mentioned earlier to ever fail.

-- 
Ian Hickson               U+1047E                )\._.,--....,'``.    fL
http://ln.hixie.ch/       U+263A                /,   _.. \   _\  ;`._ ,.
Things that are impossible just take longer.   `._.-(,_..'--(,_..'`-.;.'
Received on Wednesday, 29 July 2009 19:54:31 UTC