RE: Reliable Messaging - Summary of Threads from Ricky Ho on 2002-12-14 (www-ws-arch@w3.org from December 2002)

From: Ricky Ho <riho@cisco.com>
Date: Sat, 14 Dec 2002 00:26:37 -0800
To: "Assaf Arkin" <arkin@intalio.com>, <www-ws-arch@w3.org>
Message-Id: <4.3.2.7.2.20021214001937.02718130@franklin.cisco.com>
Arkin, can you elaborate your point that "using a sync transport protocol, 
there will be no possibility of message loss" ??  here is an example.

Node A sends a message to node B using a sync transport protocol HTTP POST

A open a TCP connection to B successfully.
A send a stream of request data (in the HTTP format) to B.
Suddenly, the TCP connection drops.

How does A know if B has received the request message or not ?

Best regards,
Ricky

At 08:03 PM 12/13/2002 -0800, Assaf Arkin wrote:
>The two army problem is concerned with the possibility of message loss. 
>Message loss could occur when you are using an asynchronous transport 
>protocol, though in most literature the term would be medium, where 
>protocol is a more generic term that would even cover a choreography.
>
>Although you can have an asynchronous API for performing an operation, 
>that API is between you and a messaging engine and typically you would use 
>in-process calls or some synchronous transport, so there's no possibility 
>of message loss. You can tell without a doubt whether the messaging engine 
>is going to send the message or not.
>
>Even if the operation you are doing is asynchronous, you can use a 
>synchronous protocol such as HTTP POST to deliver the message in which 
>case there is no possibility for message loss. But you can also use an 
>asynchronous protocol such as SMTP or UDP, in which case the message could 
>be lost on the way to its definition. Lost has a loose definition, a 
>message that gets garbled, delayed or routed to the wrong place is 
>considered lost.
>
>Addressing message loss is therefore a problem of the protocol you use and 
>not the operation you perform. So in my opinion that is outside the scope 
>of WSDL abstract operation definition, but in the scope of specific 
>protocol bindings, an it would definitely help if the protocol layer 
>(XMLP) could address that relieving us of the need to define ack operations.
>
>arkin
>-----Original Message-----
>From: www-ws-arch-request@w3.org [mailto:www-ws-arch-request@w3.org]On 
>Behalf Of Cutler, Roger (RogerCutler)
>Sent: Friday, December 13, 2002 1:28 PM
>To: Assaf Arkin; www-ws-arch@w3.org
>Subject: RE: Reliable Messaging - Summary of Threads
>
>Thanks for the support.
>
>One thing this note reminded me of -- I have seen a number of different 
>definitions of "synchronous" floating around this group.  In fact, if my 
>memory serves, there are three major ones.  One concentrates on the idea 
>that a call "blocks" if it is synchronous, another has a complicated logic 
>that I cannot recall and the third (contained in one of the references on 
>the two army problem) concentrates on the length of time it takes for a 
>message to arrive.  The formality of all of these definitions indicates to 
>me that all have had considerable thought put into them and that all are, 
>in their context, "correct".  They are, however, also different.
>
>-----Original Message-----
>From: Assaf Arkin [mailto:arkin@intalio.com]
>Sent: Friday, December 13, 2002 2:27 PM
>To: Cutler, Roger (RogerCutler); www-ws-arch@w3.org
>Subject: RE: Reliable Messaging - Summary of Threads
>
>
>
>3 - There is concern about the "two army" problem, which essentially says 
>that it is not possible, given certain assumptions about the types of 
>interactions, for all parties in the communication to reliably reach 
>consensus about what has happened.  I have been trying to encourage the 
>objective of documenting the scenarios that can come up in and their 
>relative importance and possibly mitigating factors or strategies.  I 
>haven't seen people violently disagreeing but I wouldn't call this a 
>consensus point of view.  I consider the ebXML spec as weak in discussing 
>the two-army problem.
>The two army problem assumes you are using a non-reliable medium for all 
>your communication and proves that it is impossible for the sender to 
>reach confidence that the message has arrived and is processed in 100% of 
>cases.
>
>You can increase your level of confidence by using message + ack and being 
>able to resend a message and receive a duplicate ack. That get's you close 
>to a 100% but not quite there, but it means that in most cases the 
>efficient solution (using asynchronous messaging) would work, and so 
>presents a viable option.
>
>In my opinion it is sufficient for a low level protocol to give you that 
>level of reliability. And that capability is generic enough that we would 
>want to address it at the protocol level in a consistent manner, so we 
>reduce at least one level of complexity for the service developer. It is 
>also supported by a variety of transport protocols and mediums.
>
>This still doesn't mean you can get two distributed services to propertly 
>communicate with each other in all cases. A problem arises if either the 
>message was not received (and is not processed), a message was received 
>but no ack recevied (and is processed) or a message was received and an 
>ack was received but the message is still not processed.
>
>That problem is not unique to asynchronous messaging, in fact it also 
>presents itself when synchronous messaging is used. With synchronous 
>messaging you have 100% confidence that a message was received, but no 
>confidence that it will be processed. Furthermore, you may fail before you 
>are able to persist that information, in which case your confidence is lost.
>
>If you do not depend on the result of the message being processed than you 
>would simply regard each message that is sent as being potentially 
>processed. You use the ack/resend mechanism as a way to increase the 
>probability that the message indeed reaches its destination, so a majority 
>of your messages will be received and.
>
>I argue that using ack/resend you could reach the same level of confidence 
>that the message will be processed as if you were using a synchronous 
>protocol, but could do so more efficiently.
>
>If you do depend on the message being processes, then you are in a 
>different class of problem, and simply having a reliable protocol is not 
>sufficient since it does not address the possibility that the message was 
>received, acked but not processed. It in fact presents the same problem 
>that would arise when synchronous protocols are used.
>
>This is best solved at a higher layer. There are two possible solutions, 
>both of which are based on the need to reach a concensus between two 
>systems. One solution is based on a two-phase commit protocol, which could 
>be extended to use asynchronous patterns. A more efficient solution in 
>terms of message passing would be to use state transitions that coordinate 
>through the exchange of well defined messages. This could be modeled using 
>a choreography language.
>
>Since this is outside the scope of this discussion I will not go into 
>details, but if anyone is interested I would recommend looking at 
>protocols for handling failures in distributed systems (in particular 
>Paxos). In my understanding these protocols are applicable for modeling at 
>the choreography language and are more efficient than using transactional 
>protocols and two-phase commit.
>
>My only point here was to highlight that a solution involving ack/resend 
>is sufficient to give you the same level of confidence that a message 
>would be processed as if you were using a synchronous operation, and that 
>solutions for achieving 100% confidence are required whether you are using 
>asynchronous or synchronous messaging.
>
>This is in support of Roger's recommendation for adding ack support to XMLP.
>
>  regards,
>  arkin
>
Received on Saturday, 14 December 2002 03:27:12 UTC