RE: Reliable Messaging - Summary of Threads

Reliable Messaging - Summary of Threads
  3 - There is concern about the "two army" problem, which essentially says
that it is not possible, given certain assumptions about the types of
interactions, for all parties in the communication to reliably reach
consensus about what has happened.  I have been trying to encourage the
objective of documenting the scenarios that can come up in and their
relative importance and possibly mitigating factors or strategies.  I
haven't seen people violently disagreeing but I wouldn't call this a
consensus point of view.  I consider the ebXML spec as weak in discussing
the two-army problem.

  The two army problem assumes you are using a non-reliable medium for all
your communication and proves that it is impossible for the sender to reach
confidence that the message has arrived and is processed in 100% of cases.

  You can increase your level of confidence by using message + ack and being
able to resend a message and receive a duplicate ack. That get's you close
to a 100% but not quite there, but it means that in most cases the efficient
solution (using asynchronous messaging) would work, and so presents a viable
option.

  In my opinion it is sufficient for a low level protocol to give you that
level of reliability. And that capability is generic enough that we would
want to address it at the protocol level in a consistent manner, so we
reduce at least one level of complexity for the service developer. It is
also supported by a variety of transport protocols and mediums.

  This still doesn't mean you can get two distributed services to propertly
communicate with each other in all cases. A problem arises if either the
message was not received (and is not processed), a message was received but
no ack recevied (and is processed) or a message was received and an ack was
received but the message is still not processed.

  That problem is not unique to asynchronous messaging, in fact it also
presents itself when synchronous messaging is used. With synchronous
messaging you have 100% confidence that a message was received, but no
confidence that it will be processed. Furthermore, you may fail before you
are able to persist that information, in which case your confidence is lost.

  If you do not depend on the result of the message being processed than you
would simply regard each message that is sent as being potentially
processed. You use the ack/resend mechanism as a way to increase the
probability that the message indeed reaches its destination, so a majority
of your messages will be received and.

  I argue that using ack/resend you could reach the same level of confidence
that the message will be processed as if you were using a synchronous
protocol, but could do so more efficiently.

  If you do depend on the message being processes, then you are in a
different class of problem, and simply having a reliable protocol is not
sufficient since it does not address the possibility that the message was
received, acked but not processed. It in fact presents the same problem that
would arise when synchronous protocols are used.

  This is best solved at a higher layer. There are two possible solutions,
both of which are based on the need to reach a concensus between two
systems. One solution is based on a two-phase commit protocol, which could
be extended to use asynchronous patterns. A more efficient solution in terms
of message passing would be to use state transitions that coordinate through
the exchange of well defined messages. This could be modeled using a
choreography language.

  Since this is outside the scope of this discussion I will not go into
details, but if anyone is interested I would recommend looking at protocols
for handling failures in distributed systems (in particular Paxos). In my
understanding these protocols are applicable for modeling at the
choreography language and are more efficient than using transactional
protocols and two-phase commit.

  My only point here was to highlight that a solution involving ack/resend
is sufficient to give you the same level of confidence that a message would
be processed as if you were using a synchronous operation, and that
solutions for achieving 100% confidence are required whether you are using
asynchronous or synchronous messaging.

  This is in support of Roger's recommendation for adding ack support to
XMLP.

   regards,
   arkin

Received on Friday, 13 December 2002 15:27:20 UTC