SPARUL and Transactions from Orri Erling on 2009-07-14 (public-rdf-dawg@w3.org from July to September 2009)

From: Orri Erling <erling@xs4all.nl>
Date: Tue, 14 Jul 2009 19:06:01 +0200
To: <public-rdf-dawg@w3.org>
Message-Id: <200907141706.n6EH6uRk044720@smtp-vbr5.xs4all.nl>
 

 

All

 

Concerning transactions and SPARUL, I would make the following comments:

 

In most of our use of SPARUL we are dealing with bulk load and bulk update
situations that have no concurrency or transactional requirements.

 

Virtuoso supports full transactions up to serializable isolation with RDF as
well as relational data.  With  SPARRUL, we use this rather  seldom.

 

The reason is that one can easily run out of memory for rollback data if
updating millions of rows, which is not uncommon with SPARUL, for example if
it is being used for materializing entailment.

 

If there is a limit to transaction size, as there will be in systems which
must keep rollback state, this will easily be hit and it is very difficult
to split large insert-select combinations to smaller chunks.  Thus we have
made a row autocommit mode which  commits every now and then on its own
initiative.  Internally, serializable isolation is needed in order not to
insert the same thing twice from two threads but such things are not visible
to the user.

 

If the SPARQL protocol  should say anything  about  transactions, we would
suggest it contained a switch for disabling any atomicity.  This would
explicitly state that rollback information need not be kept, i.e. the system
can commit as often as it wants, and that no repeatability of read applies.

 

 

A resonable default would be to be atomic for the update, saying nothing of
read repeatability. A transaction of course would not encompass anything
except the content of a single post request.  And even this should be
disableable for purposes of bulk operations.

 

 

In our experience, bulk copying of data is much more common than any
resource-committing transaction such as an update of a balance on an
account.  In fact I do not know that we would have done the latter at any
time in RDF.

 

I suggest issues of transactions be relegated to implementations and to
connection based API's.  In such situations connection opptions can be used
for isolation, exclusive  read and such things which are needed in
transactional applications.

 

 

 

 

Orri

            

 

            

 

All

 

Concerning transactions and SPARUL, I would make the following comments:

 

In most of our use of SPARUL we are dealing with bulk load and bulk update
situations that have no concurrency or transactional requirements.

 

Virtuoso supports full transactions up to serializable isolation with RDF as
well as relational data.  With  SPARRUL, we use this rather  seldom.

 

The reason is that one can easily run out of memory for rollback data if
updating millions of rows, which is not uncommon with SPARUL, for example if
it is being used for materializing entailment.

 

If there is a limit to transaction size, as there will be in systems which
must keep rollback state, this will easily be hit and it is very difficult
to split large insert-select combinations to smaller chunks.  Thus we have
made a row autocommit mode which  commits every now and then on its own
initiative.  Internally, serializable isolation is needed in order not to
insert the same thing twice from two threads but such things are not visible
to the user.

 

If the SPARQL protocol  should say anything  about  transactions, we would
suggest it contained a switch for disabling any atomicity.  This would
explicitly state that rollback information need not be kept, i.e. the system
can commit as often as it wants, and that no repeatability of read applies.

 

 

A resonable default would be to be atomic for the update, saying nothing of
read repeatability. A transaction of course would not encompass anything
except the content of a single post request.  And even this should be
disableable for purposes of bulk operations.

 

 

In our experience, bulk copying of data is much more common than any
resource-committing transaction such as an update of a balance on an
account.  In fact I do not know that we would have done the latter at any
time in RDF.

 

I suggest issues of transactions be relegated to implementations and to
connection based API's.  In such situations connection opptions can be used
for isolation, exclusive  read and such things which are needed in
transactional applications.

 

 

 

 

Orri
Received on Tuesday, 14 July 2009 17:07:33 UTC