RE: Reliable Messaging - Summary of Threads from Ricky Ho on 2002-12-15 (www-ws-arch@w3.org from December 2002)

From: Ricky Ho <riho@cisco.com>
Date: Sat, 14 Dec 2002 17:09:08 -0800
To: "Assaf Arkin" <arkin@intalio.com>, <www-ws-arch@w3.org>
Message-Id: <4.3.2.7.2.20021214170633.0269bb60@franklin.cisco.com>
What you say is correct !  But only at the TCP packet level, not the 
message level.

To some degree, I feel our previous RM handshaking discussion is 
re-implementing the TCP handshaking at the message level

Rgds, Ricky

At 11:58 AM 12/14/2002 -0800, Assaf Arkin wrote:
>Ricky,
>
>TCP takes care of that.
>
>IP is a basic packet routing protocol that sends individual packets from 
>one machine to another. IP has message loss. A message may not arrive at 
>its destination. At the IP level the sender does not know whether the 
>message has arrived, and the received doesn't know a message was sent, so 
>there's no corrective action that will be taken.
>
>TCP is an elaborate protocol on top of IP that provides, connection-based 
>messaging. TCP uses IP which means packets sent from A to B may be lost, 
>may be received out of order, and may be received multiple times. TCP does 
>the ordering of the packets, retransmission, acks, etc.
>
>So it goes something along these lines (not exactly, but it's been a while 
>since I read the TCP spec):
>
>Node A opens connection to Node B.
>Node A starts sending a message to Node B.
>Node A identifies each packet by its order in the message.
>Node A identifiers the last packet.
>If Node B does not receive a packet it asks for retransmission.
>If Node B does receive the packet it lets Node A know (this is only 
>critical for the last packet)
>
>Keep in mind that Node A and Node B keep communicating with each other all 
>the time, sending "is alive" messages back and forth to determine if the 
>connection is still open. So even if there's no application traffic 
>between A and B, there's a lot of chatter going over the wire. If A 
>doesn't hear from B after a while, then A assumes the connection is down 
>(and vice versa).
>
>The TCP/IP stack can use the negative acks (retransmit request) in 
>combination with the is-alive chatter (positive acks) to tell the 
>application whether the message has been received or not.
>
>arkin
>
>Arkin, can you elaborate your point that "using a sync transport protocol, 
>there will be no possibility of message loss" ??  here is an example.
>
>Node A sends a message to node B using a sync transport protocol HTTP POST
>
>A open a TCP connection to B successfully.
>A send a stream of request data (in the HTTP format) to B.
>Suddenly, the TCP connection drops.
>
>How does A know if B has received the request message or not ?
>
>Best regards,
>Ricky
>
>At 08:03 PM 12/13/2002 -0800, Assaf Arkin wrote:
>>The two army problem is concerned with the possibility of message loss. 
>>Message loss could occur when you are using an asynchronous transport 
>>protocol, though in most literature the term would be medium, where 
>>protocol is a more generic term that would even cover a choreography.
>>
>>Although you can have an asynchronous API for performing an operation, 
>>that API is between you and a messaging engine and typically you would 
>>use in-process calls or some synchronous transport, so there's no 
>>possibility of message loss. You can tell without a doubt whether the 
>>messaging engine is going to send the message or not.
>>
>>Even if the operation you are doing is asynchronous, you can use a 
>>synchronous protocol such as HTTP POST to deliver the message in which 
>>case there is no possibility for message loss. But you can also use an 
>>asynchronous protocol such as SMTP or UDP, in which case the message 
>>could be lost on the way to its definition. Lost has a loose definition, 
>>a message that gets garbled, delayed or routed to the wrong place is 
>>considered lost.
>>
>>Addressing message loss is therefore a problem of the protocol you use 
>>and not the operation you perform. So in my opinion that is outside the 
>>scope of WSDL abstract operation definition, but in the scope of specific 
>>protocol bindings, an it would definitely help if the protocol layer 
>>(XMLP) could address that relieving us of the need to define ack operations.
>>
>>arkin
>>-----Original Message-----
>>From: www-ws-arch-request@w3.org [mailto:www-ws-arch-request@w3.org]On 
>>Behalf Of Cutler, Roger (RogerCutler)
>>Sent: Friday, December 13, 2002 1:28 PM
>>To: Assaf Arkin; www-ws-arch@w3.org
>>Subject: RE: Reliable Messaging - Summary of Threads
>>
>>Thanks for the support.
>>One thing this note reminded me of -- I have seen a number of different 
>>definitions of "synchronous" floating around this group.  In fact, if my 
>>memory serves, there are three major ones.  One concentrates on the idea 
>>that a call "blocks" if it is synchronous, another has a complicated 
>>logic that I cannot recall and the third (contained in one of the 
>>references on the two army problem) concentrates on the length of time it 
>>takes for a message to arrive.  The formality of all of these definitions 
>>indicates to me that all have had considerable thought put into them and 
>>that all are, in their context, "correct".  They are, however, also 
>>different.
>>-----Original Message-----
>>From: Assaf Arkin [mailto:arkin@intalio.com]
>>Sent: Friday, December 13, 2002 2:27 PM
>>To: Cutler, Roger (RogerCutler); www-ws-arch@w3.org
>>Subject: RE: Reliable Messaging - Summary of Threads
>>
>>
>>
>>
>>3 - There is concern about the "two army" problem, which essentially says 
>>that it is not possible, given certain assumptions about the types of 
>>interactions, for all parties in the communication to reliably reach 
>>consensus about what has happened.  I have been trying to encourage the 
>>objective of documenting the scenarios that can come up in and their 
>>relative importance and possibly mitigating factors or strategies.  I 
>>haven't seen people violently disagreeing but I wouldn't call this a 
>>consensus point of view.  I consider the ebXML spec as weak in discussing 
>>the two-army problem.
>>The two army problem assumes you are using a non-reliable medium for all 
>>your communication and proves that it is impossible for the sender to 
>>reach confidence that the message has arrived and is processed in 100% of 
>>cases.
>>You can increase your level of confidence by using message + ack and 
>>being able to resend a message and receive a duplicate ack. That get's 
>>you close to a 100% but not quite there, but it means that in most cases 
>>the efficient solution (using asynchronous messaging) would work, and so 
>>presents a viable option.
>>In my opinion it is sufficient for a low level protocol to give you that 
>>level of reliability. And that capability is generic enough that we would 
>>want to address it at the protocol level in a consistent manner, so we 
>>reduce at least one level of complexity for the service developer. It is 
>>also supported by a variety of transport protocols and mediums.
>>This still doesn't mean you can get two distributed services to propertly 
>>communicate with each other in all cases. A problem arises if either the 
>>message was not received (and is not processed), a message was received 
>>but no ack recevied (and is processed) or a message was received and an 
>>ack was received but the message is still not processed.
>>That problem is not unique to asynchronous messaging, in fact it also 
>>presents itself when synchronous messaging is used. With synchronous 
>>messaging you have 100% confidence that a message was received, but no 
>>confidence that it will be processed. Furthermore, you may fail before 
>>you are able to persist that information, in which case your confidence 
>>is lost.
>>If you do not depend on the result of the message being processed than 
>>you would simply regard each message that is sent as being potentially 
>>processed. You use the ack/resend mechanism as a way to increase the 
>>probability that the message indeed reaches its destination, so a 
>>majority of your messages will be received and.
>>I argue that using ack/resend you could reach the same level of 
>>confidence that the message will be processed as if you were using a 
>>synchronous protocol, but could do so more efficiently.
>>If you do depend on the message being processes, then you are in a 
>>different class of problem, and simply having a reliable protocol is not 
>>sufficient since it does not address the possibility that the message was 
>>received, acked but not processed. It in fact presents the same problem 
>>that would arise when synchronous protocols are used.
>>This is best solved at a higher layer. There are two possible solutions, 
>>both of which are based on the need to reach a concensus between two 
>>systems. One solution is based on a two-phase commit protocol, which 
>>could be extended to use asynchronous patterns. A more efficient solution 
>>in terms of message passing would be to use state transitions that 
>>coordinate through the exchange of well defined messages. This could be 
>>modeled using a choreography language.
>>Since this is outside the scope of this discussion I will not go into 
>>details, but if anyone is interested I would recommend looking at 
>>protocols for handling failures in distributed systems (in particular 
>>Paxos). In my understanding these protocols are applicable for modeling 
>>at the choreography language and are more efficient than using 
>>transactional protocols and two-phase commit.
>>My only point here was to highlight that a solution involving ack/resend 
>>is sufficient to give you the same level of confidence that a message 
>>would be processed as if you were using a synchronous operation, and that 
>>solutions for achieving 100% confidence are required whether you are 
>>using asynchronous or synchronous messaging.
>>This is in support of Roger's recommendation for adding ack support to XMLP.
>>  regards,
>>  arkin
Received on Saturday, 14 December 2002 20:09:50 UTC