RE: Different Levels of Reliable Messaging - Intermediaries from Newcomer, Eric on 2002-12-13 (www-ws-arch@w3.org from December 2002)

From: Newcomer, Eric <Eric.Newcomer@iona.com>
Date: Fri, 13 Dec 2002 12:46:20 -0500
To: "Burdett, David" <david.burdett@commerceone.com>, "Ugo Corda" <UCorda@SeeBeyond.com>, <www-ws-arch@w3.org>
Message-ID: <DCF6EF589A22A14F93DFB949FD8C4AB2BA172A@amereast-ems1.IONAGLOBAL.COM>
Yes, the end to end ack is the one that's important, and that seems consistent with the relationship of intermediaries to the content of the message, at least in the SOAP spec, they are only allowed to process headers (and therefore in effect are transparent to the relationship between the content of the message and the endpoints).

-----Original Message-----
From: Burdett, David [mailto:david.burdett@commerceone.com]
Sent: Friday, December 13, 2002 12:47 AM
To: Ugo Corda; www-ws-arch@w3.org
Subject: RE: Different Levels of Reliable Messaging - Intermediaries



Ugo. 

One idea you suggested was that you could do end-to-end reliable messaging if each of the hops used reliable messaging. Although this is technically true, where it falls down is that the sender of the original message might not always *know* that there are multiple hops as perhaps the path being followed is determined dynamically.Even if they know there are multiple hops they might not know if each hop is reliable. So, although your suggestion can work in some cases, it won't always.

END-TO-END ACKS ARE THE ONLY GUARANTEE 
-------------------------------------- 
So to be *sure* that a message has been delivered the Acknowledgement Message MUST come from the ultimate receiver of the message. However identifying the true "ultimate receiver" is not always obvious as you can argue that it should be the (first?) SOAP node at the destination or perhaps the application running behind it - this isssue is discussed later.

However doing end-to-end acks over multiple hops using different transport protocols can be problematical because of the different speeds of the transport protocol could be used in each hop.

DIFFERENT SPEEDS OF TRANSPORT PROTOCOLS 
--------------------------------------- 
If you are doing a single hop then, since you can know the *speed* of the transport protocol being used from beginning to end, you can estimate how long it "should" be before the acknowledgement arrives. Then, if the response takes some time longer than this then you can resend the original message following the Level 1 Simple Reliable Messaging Approach described in my earlier email.

As said earlier, if you have multiple hops then this doesn't always work since each hop could involve a transport protocol that works at a different speed. For example the first hop could be HTTP and the next SMTP. What makes it worse, is that the sender of the original message might not even know that there is another hop and that it is going over SMTP.

CALCULATING THE TIMEOUT 
----------------------- 
The simplest way around this problem is to just do an end-to-end ack BUT base the value of the "timeout" to use on when the message expires as indicated by the "expires at". So, for example, if the original message had to be delivered within 1 hour, you might want to do a retry if the acknowledgement message had not been received within 10 minutes giving 6 attempts to deliver the message before "giving up".

If you think about, the "expires at" HAS to be set to a value that allows sufficient time for the original message to be sent and the acknowledgement to be received for the message to be sent with any success at all. So this approach is pretty safe.

EARLY DETECTION OF NODE FAILURE 
------------------------------- 
However, what this approach does not do is provide early warning that an intermediate node has failed. For example if you are sending the message by HTTP, then you will not know, using the timeout value described earlier, that transportation of the message was impossible until you had exhausted all the 6 retries.

IF (note the capitals) this is a problem then the solution is to use intermediate acknowledgement messages in addition to the end-to-end ack where the intermediate ack is targeted (e.g. using the SOAP Actor) at the "Next" node in the network.

INTERMEDIATE ACKS 
----------------- 
Since the intermediate ack is targeted at the next node, you usually can use a timeout value that is based on the speed of the transport protocol. For example, if you are using HTTP, then you might want to retry after 1 minute. If you are using SMTP, then maybe every 30 minutes. Then work out a reasonable number of retries to do before "giving up" say 3 or 4.

If you do this then you would know, again given the timeout values used above, that the message could not be sent after a much shorter time, e.g. 3-4 minutes for HTTP.

However, the complexity does not end there. It's not much use if an intermediate node has detected that it cannot send the original message any further. The sender of the original message needs to know. This means tha the intermediate node that detected the delivery failure HAS to send another message back to the sender of the original message to notify them of the failure.

So what intermediate acks really provide is an optimisation that provides much earlier warning of delivery failure - albeit at a fairly large increase in complexity.

Also remember that if End-to-End acks are used as well as the intermediate acks then, if Level 3 - Reliable Messaging with Recovery, is used, it is quite possible that it could cause another retry at sending the message some time later.

OTHER FACTORS TO CONSIDER 
------------------------- 
However there are some issues with using intermediate acks which you need to think about: 
1. You can get two acks not one - you can now receive more than one ack, the intermediate ack and the end-to-end ack which adds to the complexity

2. The Intermediate Ack *should* also be targeted at the next node in the network otherwise the sender of the original node would get an ack back from each and every intermediate node in the path

3. What do you do if the delivery failure message does not get through? Should you send the delivery failure message reliably ... and what do you do if that doesn't work either.

4. If the nodes in the network that are being targeted are SOAP nodes, then there might be other, non SOAP systems between the SOAP nodes that use different transport protocols at different speeds to connect. This makes calculating the timeout value for intermediate acks harder to do reliably.

ULTIMATE RECEIVER 
----------------- 
The other issue with end-to-end acks is knowing what actually is the end. If the last SOAP node, for example, in the path actually consumes the payload/body of the SOAP message then there isn't much of a problem. But if it does not, then the message could be lost after the SOAP node has passed on the message to the application that actually processes it. This is why the Acknowledgement Message described in the original email in this thread can optionally provide additional information on the validation and passing off of the message to the application.

BOTTOM LINE 
----------- 
I think that doing end-to-end acks where the timeout is based on the expiry time of the message is the best approach to use. Doing intermediate acks, only provides early warning of a problem, it isn't a substitute for end-to-end acks. So I would recommend that intermediate acks are left out of scope ... if this ever becomes an activity.

Thoughts? 

David 




-----Original Message----- 
From: Ugo Corda [ mailto:UCorda@SeeBeyond.com] 
Sent: Thursday, December 12, 2002 5:56 PM 
To: Burdett, David; www-ws-arch@w3.org 
Subject: RE: Different Levels of Reliable Messaging 


David, 

I am glad you mention the case of intermediaries, because I have been thinking for a while about how that affects Reliable Messaging.

In the case of security, what is usually said in order to justify the need to address it at the SOAP headers level, instead of just using HTTPS, is that you need that level if you want to do end-to-end security. Otherwise intermediate nodes would need to have access to security sensitive information related to the message in order to relay information from hop to hop, and those nodes might not be authorized to do that.

But in the case of reliable messaging, it seems that you should be able to use, for example, SOAP over HTTPR on one hop, and SOAP over JMS on the next hop, and still be able to support reliable messaging end-to-end. (The message goes reliably from A to C iff it goes reliably from A to B and from B to C - for example, B waits until it gets the transport-level ack from C before sending its transport-level ack to A). 

In fact, I think this was the rationale when IBM designed HTTPR, so that you could go from Internet to intranet (and vice versa), using SOAP over HTTPR on the Internet, and then switching to SOAP over MQSeries (or other MOM) once inside the intranet. 

Your previous message seems to imply that this approach would not be sufficient for end-to-end reliable messaging. Could you please elaborate?

Thank you, 
Ugo
Received on Friday, 13 December 2002 12:46:55 UTC