RE: [WebStorage] Concerns on spec section 'Processing Model' from Ian Hickson on 2009-07-24 (public-webapps@w3.org from July to September 2009)

From: Ian Hickson <ian@hixie.ch>
Date: Fri, 24 Jul 2009 21:53:07 +0000 (UTC)
To: Laxmi Narsimha Rao Oruganti <Laxmi.Oruganti@microsoft.com>, "Nikunj R. Mehta" <nikunj.mehta@oracle.com>
Cc: public-webapps WG <public-webapps@w3.org>
Message-ID: <Pine.LNX.4.62.0907242141030.15342@hixie.dreamhostps.com>
On Fri, 24 Jul 2009, Laxmi Narsimha Rao Oruganti wrote:
>
> Let me probe this further to get clarity.
> 
> > As I understand it, with what is specced now, if you try to get a 
> > write transaction lock, it will only fail if it times out, which would 
> > probably be a symptom of a more serious bug anyway. There's never 
> > going to be a forced rollback; once you have got a transaction lock, 
> > you are not going to ever have it fail on you unexpectedly.
> 
> My understanding of your requirement is "Database should allow only one 
> active writer transaction".  How the database systems achieve this need 
> not be explained.

Sure, so long as the implementation is black-box indistinguishable from 
what the spec says, it can do whatever it wants.


> Note that, this need not be achieved only by acquiring an exclusive lock 
> on the database file.  Think about a database implementation which is 
> not a single file based (Log + Checkpoint design model) where there is 
> one data file and a couple of log files.  Spec-ing that they have to 
> hold exclusive lock on database file is ambiguous between data file and 
> log file.  If you take BDB JE as an example, they don't even have data 
> file.  Their model is a sequence of log files.

The exclusive lock model described in the spec is just a model, it isn't 
intended to be actually require an exclusive lock. If an implementation 
can get the same result using some other mechanism, that's fine.


On Fri, 24 Jul 2009, Nikunj R. Mehta wrote:
> 
> Database developers (whether experienced DBAs or newcomer WebApp 
> programmers) identify the data set they are using through statements 
> they execute (within or outside transactions). It is the database's job 
> to find out which records are being used.

Sure.


> The concepts of transaction processing apply no matter the granularity 
> of a data item, whether it is a record or a disk block, or a whole file. 
> There are many kinds of failures (and yes, failures are always 
> unpredictable) [1]. Let's focus on failures arising from concurrency 
> control enforcement, which is probably the one most people worry about 
> from a programming perspective. In the following discussion, I use the 
> term locking , even though other protocols have been developed and are 
> in use, to guarantee serializability, i.e., correct interleaving of 
> concurrent transactions.
> 
> A knowledgeable database programmer would read the smallest set of data 
> in a transaction so as to avoid locking the entire database for 
> concurrent operations. Moreover, this approach also minimizes 
> starvation, i.e., the amount of time a program would need to wait to 
> obtain permission to exclusively access data.
> 
> Transactions can fail even if locking occurs at the whole database 
> level. As example, consider the situation:
> 
> 1. A read-only transaction is timed out because some read-write transaction
> went on for too long.
> 2. A read-write transaction is timed out because some read-only transaction
> went on for too long.

These are the only failure modes possible currently, I believe.


> 3. A read-only transaction includes inside it a read-write transaction. 

This isn't possible with the current asynchronous API as far as I can 
tell. With the synchronous API, it would hang trying to open the 
read-write transaction for however long it takes the UA to realise that 
the script that is trying to get the read-write transaction is the same 
one as the one that has an open read-only transaction, and then it would 
fail with error code 7.


> Experience has shown that there is no easy way out when dealing with 
> transactions, and locking at the whole database level is no solution to 
> failures.

It's not supposed to be a solution to failures, it's supposed to be, and 
is, as far as I can tell, a way to make unpredictable, transient, 
intermittent, and hard-to-debug concurrency errors into guaranteed, 
easy-to-debug errors.


On Fri, 24 Jul 2009, Nikunj R. Mehta wrote:
> 
> > There's never going to be a forced rollback; once you have got a 
> > transaction lock, you are not going to ever have it fail on you 
> > unexpectedly.
> 
> Even if you have a transaction lock,
> 
> 1. the application logic could cause an exception
> 2. the application finds an unacceptable data condition and needs to rollback
> the transaction

Sure, but both of those are under the control of the author.


> 3. face a disk failure

This is an exceptional situation from which there is no good recovery. It 
isn't an expected situation resulting from a complicated API.


> 4. encounter a bug in the underlying software

We can't do anything to prevent these in the spec.


> In either of these cases, how would the application code be expected to 
> recover?

In the first two and the last one, the author can debug the problem and 
fix or work around the bug. In the case of hardware failure, there is no 
sane recovery model.

These are very different from concurrency bugs.


> > I think this is an important invariant, because otherwise script 
> > writers _will_ shoot themselves in the foot.
> 
> Even if the transaction lock doesn't fail, how would one deal with other 
> transaction failures?

I don't understand the relevance. If there's a hardware error, retrying 
isn't going to help. If there's a concurrency error, the only solution 
will be to design complex locking semantics outside the API, which would 
be a terrible burden to place on Web authors.


> > These aren't professional database developers; Web authors span the 
> > gamut of developer experience from the novice who is writing code more 
> > by luck than by knowledge all the way to the UI designer who wound up 
> > stuck with the task for writing the UI logic but has no professional 
> > background in programing, let alone concurrency in databases.
> 
> This is a strong reason to avoid SQL in the front-end.

I understand that SQL is not a popular solution for everyone, yes. 
Hopefully other solutions will be proposed (so far none have been proposed 
that are serious contenders.)


> > We can't be firing unexpected exceptions when their users happen to 
> > open two tabs to the same application at the same time, leaving data 
> > unsaved.
> 
> So you'd much rather tell an application user that they should close one 
> of the two tabs since they can't obtain a read-write lock in both.

They can obtain a read-write lock in both, it's just that one of them will 
pause until the other has completed.


> This is no different from telling the user that undesirable things would 
> happen if they hit the back button, which was widely prevalent in 
> applications used from Web browsers not too long ago. And those 
> programmers knew nothing about HTTP. The solution was - knuckle down and 
> understand safe and unsafe methods and statelessness.

Just because we screwed up once, doesn't mean we should screw up again.

-- 
Ian Hickson               U+1047E                )\._.,--....,'``.    fL
http://ln.hixie.ch/       U+263A                /,   _.. \   _\  ;`._ ,.
Things that are impossible just take longer.   `._.-(,_..'--(,_..'`-.;.'
Received on Friday, 24 July 2009 21:53:43 UTC