RE: Reliable Messaging - Summary of Threads from Assaf Arkin on 2002-12-16 (www-ws-arch@w3.org from December 2002)

From: Assaf Arkin <arkin@intalio.com>
Date: Mon, 16 Dec 2002 11:06:29 -0800
To: "Burdett, David" <david.burdett@commerceone.com>, <www-ws-arch@w3.org>
Message-ID: <IGEJLEPAJBPHKACOOKHNAEABCPAA.arkin@intalio.com>
  Assaf

  >>>This brings us back to the fact that while the protocol assures no
message loss it does not guarantee that a message will be processed<<<

  This is why in LEVEL 1  - Simple Reliable Messaging I said ...

  >>>The minimum semantic of the acknowledgement message is that the
original message has been received and persisted and therefore, barring
catastrophes, it should not be lost and therefore will be processed.<<<

  TCP/IP does not give you this.

  David


  True. But TCP/IP delivers the message to the application for immediate
processing, so there is no need for persistence. If you crash upon receipt
of the message you will not produce a reply, indicating the message was not
processed.

  The need to process the message as it is received created a problem when
processing is extensive or best done independently of the receipt, hence the
benefit of using asynchronous messaging.

  We need to qualify that "processing" in this case means delivery to the
application that is using the RM, so the message is not lost after an ack is
sent (does not disappear and is not garbled). The RM does not actually need
to persist the message if it can deliver the message to the application in a
way that would assure the message could get processed (e.g. the message is
delivered immediately and the application persists it). But that's just
another way in which the RM guarantees persistence of the message.

  The fact that a message is persisted does not mean the application will be
able to process it, it only means the message would not be lost after an ack
has been sent, so the application will always receive it for processing.

  arkin

    -----Original Message-----
    From: Assaf Arkin [mailto:arkin@intalio.com]
    Sent: Saturday, December 14, 2002 5:22 PM
    To: Ricky Ho; www-ws-arch@w3.org
    Subject: RE: Reliable Messaging - Summary of Threads


    Although TCP does not care about the message, it still works very well
at the message level for two reasons:

    1. You specify the message size, that way you know exactly how many
bytes (or packets) you expect to receive and you don't finish until you
receive all of them.

    2. The TCP stack will tell you if the connection goes down even if it
just managed to deliver the last packet. In the application you will
unknowingly block until the TCP stack can determine the delivery status of
the last message.

    In some cases you cannot tell what the message size is (this happens a
lot for HTML pages). In this case the Web server tells you the message has
ended by closing the connection, but it can detect when the client gets
disconnected before receiving the entire message. So the server does not
experience message loss.

    Of course, this does not mean the the client got the message. The
connection could be terminate before the server finishes sending it, or the
client could just die after receiving the last packet, having received the
entire message and losing it by crashing.

    This brings us back to the fact that while the protocol assures no
message loss it does not guarantee that a message will be processed. Message
loss means that the sender cannot tell whether the message was received or
not, if the message is received and then lost or the client cannot receive
the entire message that is a different problem.

    arkin

      -----Original Message-----
      From: Ricky Ho [mailto:riho@cisco.com]
      Sent: Saturday, December 14, 2002 5:09 PM
      To: Assaf Arkin; www-ws-arch@w3.org
      Subject: RE: Reliable Messaging - Summary of Threads


      What you say is correct !  But only at the TCP packet level, not the
message level.

      To some degree, I feel our previous RM handshaking discussion is
re-implementing the TCP handshaking at the message level

      Rgds, Ricky

      At 11:58 AM 12/14/2002 -0800, Assaf Arkin wrote:

        Ricky,

        TCP takes care of that.

        IP is a basic packet routing protocol that sends individual packets
from one machine to another. IP has message loss. A message may not arrive
at its destination. At the IP level the sender does not know whether the
message has arrived, and the received doesn't know a message was sent, so
there's no corrective action that will be taken.

        TCP is an elaborate protocol on top of IP that provides,
connection-based messaging. TCP uses IP which means packets sent from A to B
may be lost, may be received out of order, and may be received multiple
times. TCP does the ordering of the packets, retransmission, acks, etc.

        So it goes something along these lines (not exactly, but it's been a
while since I read the TCP spec):

        Node A opens connection to Node B.
        Node A starts sending a message to Node B.
        Node A identifies each packet by its order in the message.
        Node A identifiers the last packet.
        If Node B does not receive a packet it asks for retransmission.
        If Node B does receive the packet it lets Node A know (this is only
critical for the last packet)

        Keep in mind that Node A and Node B keep communicating with each
other all the time, sending "is alive" messages back and forth to determine
if the connection is still open. So even if there's no application traffic
between A and B, there's a lot of chatter going over the wire. If A doesn't
hear from B after a while, then A assumes the connection is down (and vice
versa).

        The TCP/IP stack can use the negative acks (retransmit request) in
combination with the is-alive chatter (positive acks) to tell the
application whether the message has been received or not.

        arkin
          Arkin, can you elaborate your point that "using a sync transport
protocol, there will be no possibility of message loss" ??  here is an
example.


          Node A sends a message to node B using a sync transport protocol
HTTP POST


          A open a TCP connection to B successfully.
          A send a stream of request data (in the HTTP format) to B.
          Suddenly, the TCP connection drops.


          How does A know if B has received the request message or not ?


          Best regards,
          Ricky


          At 08:03 PM 12/13/2002 -0800, Assaf Arkin wrote:
            The two army problem is concerned with the possibility of
message loss. Message loss could occur when you are using an asynchronous
transport protocol, though in most literature the term would be medium,
where protocol is a more generic term that would even cover a choreography.
            Although you can have an asynchronous API for performing an
operation, that API is between you and a messaging engine and typically you
would use in-process calls or some synchronous transport, so there's no
possibility of message loss. You can tell without a doubt whether the
messaging engine is going to send the message or not.
            Even if the operation you are doing is asynchronous, you can use
a synchronous protocol such as HTTP POST to deliver the message in which
case there is no possibility for message loss. But you can also use an
asynchronous protocol such as SMTP or UDP, in which case the message could
be lost on the way to its definition. Lost has a loose definition, a message
that gets garbled, delayed or routed to the wrong place is considered lost.
            Addressing message loss is therefore a problem of the protocol
you use and not the operation you perform. So in my opinion that is outside
the scope of WSDL abstract operation definition, but in the scope of
specific protocol bindings, an it would definitely help if the protocol
layer (XMLP) could address that relieving us of the need to define ack
operations.
            arkin
            -----Original Message-----
            From: www-ws-arch-request@w3.org
[mailto:www-ws-arch-request@w3.org]On Behalf Of Cutler, Roger (RogerCutler)
            Sent: Friday, December 13, 2002 1:28 PM
            To: Assaf Arkin; www-ws-arch@w3.org
            Subject: RE: Reliable Messaging - Summary of Threads


            Thanks for the support.
            One thing this note reminded me of -- I have seen a number of
different definitions of "synchronous" floating around this group.  In fact,
if my memory serves, there are three major ones.  One concentrates on the
idea that a call "blocks" if it is synchronous, another has a complicated
logic that I cannot recall and the third (contained in one of the references
on the two army problem) concentrates on the length of time it takes for a
message to arrive.  The formality of all of these definitions indicates to
me that all have had considerable thought put into them and that all are, in
their context, "correct".  They are, however, also different.
            -----Original Message-----
            From: Assaf Arkin [mailto:arkin@intalio.com]
            Sent: Friday, December 13, 2002 2:27 PM
            To: Cutler, Roger (RogerCutler); www-ws-arch@w3.org
            Subject: RE: Reliable Messaging - Summary of Threads




            3 - There is concern about the "two army" problem, which
essentially says that it is not possible, given certain assumptions about
the types of interactions, for all parties in the communication to reliably
reach consensus about what has happened.  I have been trying to encourage
the objective of documenting the scenarios that can come up in and their
relative importance and possibly mitigating factors or strategies.  I
haven't seen people violently disagreeing but I wouldn't call this a
consensus point of view.  I consider the ebXML spec as weak in discussing
the two-army problem.
            The two army problem assumes you are using a non-reliable medium
for all your communication and proves that it is impossible for the sender
to reach confidence that the message has arrived and is processed in 100% of
cases.
            You can increase your level of confidence by using message + ack
and being able to resend a message and receive a duplicate ack. That get's
you close to a 100% but not quite there, but it means that in most cases the
efficient solution (using asynchronous messaging) would work, and so
presents a viable option.
            In my opinion it is sufficient for a low level protocol to give
you that level of reliability. And that capability is generic enough that we
would want to address it at the protocol level in a consistent manner, so we
reduce at least one level of complexity for the service developer. It is
also supported by a variety of transport protocols and mediums.
            This still doesn't mean you can get two distributed services to
propertly communicate with each other in all cases. A problem arises if
either the message was not received (and is not processed), a message was
received but no ack recevied (and is processed) or a message was received
and an ack was received but the message is still not processed.
            That problem is not unique to asynchronous messaging, in fact it
also presents itself when synchronous messaging is used. With synchronous
messaging you have 100% confidence that a message was received, but no
confidence that it will be processed. Furthermore, you may fail before you
are able to persist that information, in which case your confidence is lost.
            If you do not depend on the result of the message being
processed than you would simply regard each message that is sent as being
potentially processed. You use the ack/resend mechanism as a way to increase
the probability that the message indeed reaches its destination, so a
majority of your messages will be received and.
            I argue that using ack/resend you could reach the same level of
confidence that the message will be processed as if you were using a
synchronous protocol, but could do so more efficiently.
            If you do depend on the message being processes, then you are in
a different class of problem, and simply having a reliable protocol is not
sufficient since it does not address the possibility that the message was
received, acked but not processed. It in fact presents the same problem that
would arise when synchronous protocols are used.
            This is best solved at a higher layer. There are two possible
solutions, both of which are based on the need to reach a concensus between
two systems. One solution is based on a two-phase commit protocol, which
could be extended to use asynchronous patterns. A more efficient solution in
terms of message passing would be to use state transitions that coordinate
through the exchange of well defined messages. This could be modeled using a
choreography language.
            Since this is outside the scope of this discussion I will not go
into details, but if anyone is interested I would recommend looking at
protocols for handling failures in distributed systems (in particular
Paxos). In my understanding these protocols are applicable for modeling at
the choreography language and are more efficient than using transactional
protocols and two-phase commit.
            My only point here was to highlight that a solution involving
ack/resend is sufficient to give you the same level of confidence that a
message would be processed as if you were using a synchronous operation, and
that solutions for achieving 100% confidence are required whether you are
using asynchronous or synchronous messaging.
            This is in support of Roger's recommendation for adding ack
support to XMLP.
             regards,
             arkin
Received on Monday, 16 December 2002 14:07:22 UTC