RE: The other concensus problem from Assaf Arkin on 2003-01-10 (www-ws-arch@w3.org from January 2003)

From: Assaf Arkin <arkin@intalio.com>
Date: Fri, 10 Jan 2003 12:59:20 -0800
To: "Mark Baker" <distobj@acm.org>
Cc: <www-ws-arch@w3.org>
Message-ID: <IGEJLEPAJBPHKACOOKHNAEFNDAAA.arkin@intalio.com>
> On Thu, Jan 09, 2003 at 03:07:46PM -0800, Assaf Arkin wrote:
> > A lot of the comments made with regards to the HTTP GET/PUT
> approach so far
> > seem to mirror these algorithms,
>
> That's no coincidence! 8-)

It might also explain why I hold this approach in such high regard. I think
there's a lot to be gained from its application and in more than one case I
would elect to use it over "other ways of doing things". Of course, I always
do cost/benefit analysis first to determine which approach to use for which
application.


> > though for the most part I would say that
> > the discussion has ignored some of the safety mechanism that is
> required for
> > reliability, such as round identifiers and failure detection.
>
> That's true, we have ignored those in the discussion.  But consider;
>
> http://www.w3.org/1999/04/Editing/
>
> which is a solution to the "lost update" problem, i.e. where subsequent
> rounds modify the actions of prior rounds.  Etags aren't round
> identifiers, but are state identifiers which can be used similarly.

I only read it briefly, but it seems to do exactly that. It offers WebDAV a
way to manage concurrent changes to documents without having to hold locks
for a long period of time. So if you are doing operations that take a
significant amount of time (e.g. editing a document), you would want to use
this approach. On the other hand, if you are doing an operation that takes
an insignificant amount of time (e.g. moving a document around) you get
better performance by holding locks. So instead of doing GET/PUT/GET/PUT/PUT
you would simply do a MOVE. A higher level, non-idempotent operations but
one for which you can guarantee success with less message passing.

Not idempotent in the sense that you actually move "a" document from
location X to location Y, multiple applications does not lead to the
document you thought was at location X being at location Y. It can be made
to be idempotent, but that's just shifting the problem away, eventually
someone has to do a non-idempotent operation. I consider that in some
applications simplification of the client is achieved if the server provides
non-idempotent operations rather than having the client manage a sequence of
idempotent operations, so for the purpose of this discussion I am trying to
illustrate this conceptually.


> What do you mean by "traditional coordination protocols"?  Most of the
> coordination protocols I know about are quite aware of these issues.

Transaction managers and MOMs use protocols that do not deal well with
failure. "Traditional" may be a bad euphamism and passed no judgement, it
just refers to the fact that these protocols were designed decades before
there was significant research to propose alternative algorithms.

When you commit a transaction using X/Open XA (or any similar protocol like
DTS, OTS, etc) you have to talk to all the participants. If a participant
fails, you have to wait for the participant to come back online before you
can conclude the coordination of the transaction. You can decide on the
outcome before (rollback in this case), but you can't complete the
transaction.

In this class of algorithms if a participant goes down you would still be
able to complete the transaction. When the participant come back online it
would detect the status of the transaction from what other participants
report.

So why is it not used in transaction protocols? For one, because in practice
it doesn't save you much, the cost/benefit works against it in simplifying
the way transaction managers work. For another, because in order to tolerate
n failures you need at least 2*n+1 members and sometimes 4*n members (in the
case of transactions 2*n+1 is enough since the outcome can be decided
unileterally). So you need at least three transaction managers to tolerate
failure of one. That's actually where you would use 2pc. But if you have
only two transaction managers you could do 1pc which is more efficient.

Again, it all boils down to cost/benefit. You optimize for the case of
no-failure and you can leverage 1pc, but you pay for it when failure does
occur. You can optimize for case of failure, but you would pay for it when
failure doesn't occur (the common case).

arkin

>
> > There's a whole class of use cases where you would want to use these
> > algorithms, and definitely a good learning opportunity for the
> WS community.
>
> Agreed!
>
> MB
> --
> Mark Baker.   Ottawa, Ontario, CANADA.        http://www.markbaker.ca
> Web architecture consulting, technical reports, evaluation & analysis
>
Received on Friday, 10 January 2003 16:00:01 UTC