Re: [WebStorage] Concerns on spec section 'Processing Model' from Nikunj R. Mehta on 2009-07-24 (public-webapps@w3.org from July to September 2009)

From: Nikunj R. Mehta <nikunj.mehta@oracle.com>
Date: Fri, 24 Jul 2009 13:54:27 -0700
To: Ian Hickson <ian@hixie.ch>
Cc: public-webapps WG <public-webapps@w3.org>, Laxmi Narsimha Rao Oruganti <Laxmi.Oruganti@microsoft.com>
Message-Id: <3A4CC769-3B07-440D-AE34-456511779A16@oracle.com>

On Jul 23, 2009, at 6:24 PM, Ian Hickson wrote:

> On Thu, 16 Jul 2009, Nikunj R. Mehta wrote:
>>
>> The spec should not restrict implementations to any one level of
>> concurrency unless there are specific undesirable effects.
>>
>> Restricting the database to a single writer means that if there are  
>> separate
>> workers or background threads working to update non-overlapping  
>> portions, then
>> they have to wait for the lone current writer. Implementations can  
>> certainly
>> compete to produce the level of concurrency that developers need.
>> Specifically, I propose that the following text
>> [[
>> If the mode is read/write, the transaction must have an exclusive  
>> write lock
>> over the entire database. If the mode is read-only, the transaction  
>> must have
>> a shared read lock over the entire database. The user agent should  
>> wait for an
>> appropriate lock to be available.
>> ]]
>>
>> be replaced with the following text
>>
>> [[
>> Multiple read-only transactions may share the same data as long as  
>> there is no
>> transaction attempting to write the  data being read. The user  
>> agent must wait
>> for transactions that are reading some data before allowing a read/ 
>> write
>> transaction on the same data to continue.
>> ]]
>
> Since there's no way for the author to say ahead of time which rows or
> cells the transactions are going to use, how can you do the above  
> without
> ending up with some transactions failing unpredictably?
>

Database developers (whether experienced DBAs or newcomer WebApp  
programmers) identify the data set they are using through statements  
they execute (within or outside transactions). It is the database's  
job to find out which records are being used.

The concepts of transaction processing apply no matter the granularity  
of a data item, whether it is a record or a disk block, or a whole  
file. There are many kinds of failures (and yes, failures are always  
unpredictable) [1]. Let's focus on failures arising from concurrency  
control enforcement, which is probably the one most people worry about  
from a programming perspective. In the following discussion, I use the  
term locking , even though other protocols have been developed and are  
in use, to guarantee serializability, i.e., correct interleaving of  
concurrent transactions.

A knowledgeable database programmer would read the smallest set of  
data in a transaction so as to avoid locking the entire database for  
concurrent operations. Moreover, this approach also minimizes  
starvation, i.e., the amount of time a program would need to wait to  
obtain permission to exclusively access data.

Transactions can fail even if locking occurs at the whole database  
level. As example, consider the situation:

1. A read-only transaction is timed out because some read-write  
transaction went on for too long.
2. A read-write transaction is timed out because some read-only  
transaction went on for too long.
3. A read-only transaction includes inside it a read-write  
transaction. However, it was running concurrently with other read-only  
transactions. Other read-only transactions do not complete before the  
time out of the read-write transaction and the read-write transaction  
fails.

Experience has shown that there is no easy way out when dealing with  
transactions, and locking at the whole database level is no solution  
to failures.

I hope that answers your question.

Nikunj
http://o-micron.blogspot.com

[1] Fundamentals of Database Systems, 4th ed, Elmasari and Navathe,  
Section 17.1.4, 2003

Received on Friday, 24 July 2009 20:56:53 UTC