RE: Different Levels of Reliable Messaging from Burdett, David on 2002-12-16 (www-ws-arch@w3.org from December 2002)

From: Burdett, David <david.burdett@commerceone.com>
Date: Mon, 16 Dec 2002 10:53:41 -0800
To: "'Ricky Ho'" <riho@cisco.com>, "Burdett, David" <david.burdett@commerceone.com>, www-ws-arch@w3.org
Message-ID: <C1E0143CD365A445A4417083BF6F42CC053D153A@C1plenaexm07.commerceone.com>

Ricky
 
<Ricky> 
This is what our disagreement is.  I think the sender should report to the
application that delivery is "in-doubt".  (not "failed")
</Ricky>
 
Not always. You actually have two use cases:
1. The message **cannot** be sent, e.g. because the outbound network
connection is down. In this the message delivery failed because it could not
even be sent
2. The message was sent but no acknowledgement message was received - this
is, as you say "in-doubt", although the probability of success, in this
case, should be low.
 
<Ricky>
I used to call this "Reliable 1-to-M messaging with atomicity". ...
...
</Ricky>
 
I agree, but I think this is something that should be layered on top.
 
David


-----Original Message-----
From: Ricky Ho [mailto:riho@cisco.com]
Sent: Saturday, December 14, 2002 4:36 PM
To: Burdett, David; www-ws-arch@w3.org
Subject: RE: Different Levels of Reliable Messaging


Response embedded in <Ricky/>




<DB2>I thinkj the expiration time is important for RM, and here's why. 

All RM is based waiting for an ack and repeatedly resending the original
message if you don't get one. The problem is when do you stop resending and
give up. At some point you have to stop but when. There are two ways of
doing this.

1. Stop after a fixed number of retries (which is what ebXML MS does), or 
2. Stop only after the message can no longer validly be processed. 

The problem with the first approach is that if you say stop sending after 3
retries (i.e. 4 sends of the message in total), then it is still quite
possible that, if the destination system was down, and then came back up it
could pick up the message and process it - this is quite normal behavior.
You could then get into the situation where:

1. The sender sent the message be gave up resending after, say 20 minutes as
no reply was received, then 
2. The sender reports to the application that deliver failed 


<Ricky> 
This is what our disagreement is.  I think the sender should report to the
application that delivery is "in-doubt".  (not "failed")
</Ricky>



3. The destination restarts, finds the message, sends the ack and starts
processing it. 
4. The sender receives the ack and has to tell the application that the
message for which it had just reported a delivery failure had actually been
received and was processed - not a desirable outcome

Alternatively by specifying a time out and basing the retries on that you
know, with a high degree of certainty, that even if the message is picked
up, it won't be processed and therefore you are much less likely to have to
report the a delivery failure and then have to reverse it.


<Ricky>
Lets say the message has been picked up before time expired, and the
receiver site is down before the ACK message is sent.
Now even though the sender keep resending message until time expired, he
still cannot conclude the delivery failure
</Ricky>



<DB2>What you describe in this example is a Business Process. It is NOT, in
my opinion, reliable messaging as you make the return of one ack dependent
on the receipt of two other acks. The bottom line is that you can only do
transaction processing if you KNOW that complete rollback of the state at
the sender and receiver is possible. Sometimes it is, and sometimes it isn't
which is why you have to determine how you do the recovery at the
application level.</DB2>


<Ricky>
I used to call this "Reliable 1-to-M messaging with atomicity".  If
successful, multiple destination will get the message.  If failed, none of
the destination will get the message, and the sender will be reported with
failure.

This has nothing to do with the receiving application.  The receiving
application doesn't need to rollback (it actually haven't process the
message).  The rollback concept is at the RM level.
</Ricky>




By the way, you have raised some very good points David ! 

Best regards, 
Ricky

Received on Monday, 16 December 2002 13:53:29 UTC