- From: Assaf Arkin <arkin@intalio.com>
- Date: Fri, 13 Dec 2002 20:03:59 -0800
- To: "Cutler, Roger \(RogerCutler\)" <RogerCutler@ChevronTexaco.com>, <www-ws-arch@w3.org>
- Message-ID: <IGEJLEPAJBPHKACOOKHNMEOACOAA.arkin@intalio.com>
MessageThe two army problem is concerned with the possibility of message loss. Message loss could occur when you are using an asynchronous transport protocol, though in most literature the term would be medium, where protocol is a more generic term that would even cover a choreography. Although you can have an asynchronous API for performing an operation, that API is between you and a messaging engine and typically you would use in-process calls or some synchronous transport, so there's no possibility of message loss. You can tell without a doubt whether the messaging engine is going to send the message or not. Even if the operation you are doing is asynchronous, you can use a synchronous protocol such as HTTP POST to deliver the message in which case there is no possibility for message loss. But you can also use an asynchronous protocol such as SMTP or UDP, in which case the message could be lost on the way to its definition. Lost has a loose definition, a message that gets garbled, delayed or routed to the wrong place is considered lost. Addressing message loss is therefore a problem of the protocol you use and not the operation you perform. So in my opinion that is outside the scope of WSDL abstract operation definition, but in the scope of specific protocol bindings, an it would definitely help if the protocol layer (XMLP) could address that relieving us of the need to define ack operations. arkin -----Original Message----- From: www-ws-arch-request@w3.org [mailto:www-ws-arch-request@w3.org]On Behalf Of Cutler, Roger (RogerCutler) Sent: Friday, December 13, 2002 1:28 PM To: Assaf Arkin; www-ws-arch@w3.org Subject: RE: Reliable Messaging - Summary of Threads Thanks for the support. One thing this note reminded me of -- I have seen a number of different definitions of "synchronous" floating around this group. In fact, if my memory serves, there are three major ones. One concentrates on the idea that a call "blocks" if it is synchronous, another has a complicated logic that I cannot recall and the third (contained in one of the references on the two army problem) concentrates on the length of time it takes for a message to arrive. The formality of all of these definitions indicates to me that all have had considerable thought put into them and that all are, in their context, "correct". They are, however, also different. -----Original Message----- From: Assaf Arkin [mailto:arkin@intalio.com] Sent: Friday, December 13, 2002 2:27 PM To: Cutler, Roger (RogerCutler); www-ws-arch@w3.org Subject: RE: Reliable Messaging - Summary of Threads 3 - There is concern about the "two army" problem, which essentially says that it is not possible, given certain assumptions about the types of interactions, for all parties in the communication to reliably reach consensus about what has happened. I have been trying to encourage the objective of documenting the scenarios that can come up in and their relative importance and possibly mitigating factors or strategies. I haven't seen people violently disagreeing but I wouldn't call this a consensus point of view. I consider the ebXML spec as weak in discussing the two-army problem. The two army problem assumes you are using a non-reliable medium for all your communication and proves that it is impossible for the sender to reach confidence that the message has arrived and is processed in 100% of cases. You can increase your level of confidence by using message + ack and being able to resend a message and receive a duplicate ack. That get's you close to a 100% but not quite there, but it means that in most cases the efficient solution (using asynchronous messaging) would work, and so presents a viable option. In my opinion it is sufficient for a low level protocol to give you that level of reliability. And that capability is generic enough that we would want to address it at the protocol level in a consistent manner, so we reduce at least one level of complexity for the service developer. It is also supported by a variety of transport protocols and mediums. This still doesn't mean you can get two distributed services to propertly communicate with each other in all cases. A problem arises if either the message was not received (and is not processed), a message was received but no ack recevied (and is processed) or a message was received and an ack was received but the message is still not processed. That problem is not unique to asynchronous messaging, in fact it also presents itself when synchronous messaging is used. With synchronous messaging you have 100% confidence that a message was received, but no confidence that it will be processed. Furthermore, you may fail before you are able to persist that information, in which case your confidence is lost. If you do not depend on the result of the message being processed than you would simply regard each message that is sent as being potentially processed. You use the ack/resend mechanism as a way to increase the probability that the message indeed reaches its destination, so a majority of your messages will be received and. I argue that using ack/resend you could reach the same level of confidence that the message will be processed as if you were using a synchronous protocol, but could do so more efficiently. If you do depend on the message being processes, then you are in a different class of problem, and simply having a reliable protocol is not sufficient since it does not address the possibility that the message was received, acked but not processed. It in fact presents the same problem that would arise when synchronous protocols are used. This is best solved at a higher layer. There are two possible solutions, both of which are based on the need to reach a concensus between two systems. One solution is based on a two-phase commit protocol, which could be extended to use asynchronous patterns. A more efficient solution in terms of message passing would be to use state transitions that coordinate through the exchange of well defined messages. This could be modeled using a choreography language. Since this is outside the scope of this discussion I will not go into details, but if anyone is interested I would recommend looking at protocols for handling failures in distributed systems (in particular Paxos). In my understanding these protocols are applicable for modeling at the choreography language and are more efficient than using transactional protocols and two-phase commit. My only point here was to highlight that a solution involving ack/resend is sufficient to give you the same level of confidence that a message would be processed as if you were using a synchronous operation, and that solutions for achieving 100% confidence are required whether you are using asynchronous or synchronous messaging. This is in support of Roger's recommendation for adding ack support to XMLP. regards, arkin
Received on Friday, 13 December 2002 23:04:42 UTC