RE: Reliable Messaging - Summary of Threads from Cutler, Roger (RogerCutler) on 2002-12-14 (www-ws-arch@w3.org from December 2002)

From: Cutler, Roger (RogerCutler) <RogerCutler@ChevronTexaco.com>
Date: Sat, 14 Dec 2002 15:27:15 -0600
To: "Martin Chapman" <martin.chapman@oracle.com>, "Assaf Arkin" <arkin@intalio.com>, www-ws-arch@w3.org
Message-ID: <7FCB5A9F010AAE419A79A54B44F3718E01624863@bocnte2k3.boc.chevrontexaco.net>
I'm sorry, now I'm really confused.  I thought that some people call it
synchronous when the thread blocks and asynchronous when it fires off
the request and then goes about its business doing other things.
 
I've gotta try to find that other definition of synchronous that was
posted months ago.  It was somewhat abstract and I think it had a
different flavor.  Does anyone remember what that was?
 
-----Original Message-----
From: Martin Chapman [mailto:martin.chapman@oracle.com] 
Sent: Friday, December 13, 2002 6:59 PM
To: Cutler, Roger (RogerCutler); 'Assaf Arkin'; www-ws-arch@w3.org
Subject: RE: Reliable Messaging - Summary of Threads


I personally have three different definitions of asynchronous which are
more or less orthogonal to each other and can be combined.
    1. async programming model - this is where your thread blocks  at
the application level until a reply or fault is received. 
    2. async transport - where the reply comes back on a different
transport connection from the request.
    3. time independent - the sender and the receiver do not have to be
up and running at the same time for the communication to happen 
       (i.e. some for intermediary or queue is involved)
 
Whether these meet other peoples definitions I am not sure, but it would
be good to get some agreed definitions for the architecture.
 
Martin.

	-----Original Message-----
	From: www-ws-arch-request@w3.org
[mailto:www-ws-arch-request@w3.org] On Behalf Of Cutler, Roger
(RogerCutler)
	Sent: Friday, December 13, 2002 1:28 PM
	To: Assaf Arkin; www-ws-arch@w3.org
	Subject: RE: Reliable Messaging - Summary of Threads
	
	
	Thanks for the support.
	 
	One thing this note reminded me of -- I have seen a number of
different definitions of "synchronous" floating around this group.  In
fact, if my memory serves, there are three major ones.  One concentrates
on the idea that a call "blocks" if it is synchronous, another has a
complicated logic that I cannot recall and the third (contained in one
of the references on the two army problem) concentrates on the length of
time it takes for a message to arrive.  The formality of all of these
definitions indicates to me that all have had considerable thought put
into them and that all are, in their context, "correct".  They are,
however, also different.
	 
	-----Original Message-----
	From: Assaf Arkin [mailto:arkin@intalio.com] 
	Sent: Friday, December 13, 2002 2:27 PM
	To: Cutler, Roger (RogerCutler); www-ws-arch@w3.org
	Subject: RE: Reliable Messaging - Summary of Threads
	
	
	 

		3 - There is concern about the "two army" problem, which
essentially says that it is not possible, given certain assumptions
about the types of interactions, for all parties in the communication to
reliably reach consensus about what has happened.  I have been trying to
encourage the objective of documenting the scenarios that can come up in
and their relative importance and possibly mitigating factors or
strategies.  I haven't seen people violently disagreeing but I wouldn't
call this a consensus point of view.  I consider the ebXML spec as weak
in discussing the two-army problem.

		The two army problem assumes you are using a
non-reliable medium for all your communication and proves that it is
impossible for the sender to reach confidence that the message has
arrived and is processed in 100% of cases.
		 
		You can increase your level of confidence by using
message + ack and being able to resend a message and receive a duplicate
ack. That get's you close to a 100% but not quite there, but it means
that in most cases the efficient solution (using asynchronous messaging)
would work, and so presents a viable option.
		 
		In my opinion it is sufficient for a low level protocol
to give you that level of reliability. And that capability is generic
enough that we would want to address it at the protocol level in a
consistent manner, so we reduce at least one level of complexity for the
service developer. It is also supported by a variety of transport
protocols and mediums.
		 
		This still doesn't mean you can get two distributed
services to propertly communicate with each other in all cases. A
problem arises if either the message was not received (and is not
processed), a message was received but no ack recevied (and is
processed) or a message was received and an ack was received but the
message is still not processed.
		 
		That problem is not unique to asynchronous messaging, in
fact it also presents itself when synchronous messaging is used. With
synchronous messaging you have 100% confidence that a message was
received, but no confidence that it will be processed. Furthermore, you
may fail before you are able to persist that information, in which case
your confidence is lost.
		 
		If you do not depend on the result of the message being
processed than you would simply regard each message that is sent as
being potentially processed. You use the ack/resend mechanism as a way
to increase the probability that the message indeed reaches its
destination, so a majority of your messages will be received and.
		 
		I argue that using ack/resend you could reach the same
level of confidence that the message will be processed as if you were
using a synchronous protocol, but could do so more efficiently.
		 
		If you do depend on the message being processes, then
you are in a different class of problem, and simply having a reliable
protocol is not sufficient since it does not address the possibility
that the message was received, acked but not processed. It in fact
presents the same problem that would arise when synchronous protocols
are used.
		 
		This is best solved at a higher layer. There are two
possible solutions, both of which are based on the need to reach a
concensus between two systems. One solution is based on a two-phase
commit protocol, which could be extended to use asynchronous patterns. A
more efficient solution in terms of message passing would be to use
state transitions that coordinate through the exchange of well defined
messages. This could be modeled using a choreography language.
		 
		Since this is outside the scope of this discussion I
will not go into details, but if anyone is interested I would recommend
looking at protocols for handling failures in distributed systems (in
particular Paxos). In my understanding these protocols are applicable
for modeling at the choreography language and are more efficient than
using transactional protocols and two-phase commit.
		 
		My only point here was to highlight that a solution
involving ack/resend is sufficient to give you the same level of
confidence that a message would be processed as if you were using a
synchronous operation, and that solutions for achieving 100% confidence
are required whether you are using asynchronous or synchronous
messaging. 
		 
		This is in support of Roger's recommendation for adding
ack support to XMLP. 
		 
		 regards,
		 arkin
Received on Saturday, 14 December 2002 16:27:49 UTC