RE: Different Levels of Reliable Messaging - Intermediaries

Ugo. 

One idea you suggested was that you could do end-to-end reliable messaging
if each of the hops used reliable messaging. Although this is technically
true, where it falls down is that the sender of the original message might
not always *know* that there are multiple hops as perhaps the path being
followed is determined dynamically.Even if they know there are multiple hops
they might not know if each hop is reliable. So, although your suggestion
can work in some cases, it won't always.

END-TO-END ACKS ARE THE ONLY GUARANTEE
--------------------------------------
So to be *sure* that a message has been delivered the Acknowledgement
Message MUST come from the ultimate receiver of the message. However
identifying the true "ultimate receiver" is not always obvious as you can
argue that it should be the (first?) SOAP node at the destination or perhaps
the application running behind it - this isssue is discussed later.

However doing end-to-end acks over multiple hops using different transport
protocols can be problematical because of the different speeds of the
transport protocol could be used in each hop.

DIFFERENT SPEEDS OF TRANSPORT PROTOCOLS
---------------------------------------
If you are doing a single hop then, since you can know the *speed* of the
transport protocol being used from beginning to end, you can estimate how
long it "should" be before the acknowledgement arrives. Then, if the
response takes some time longer than this then you can resend the original
message following the Level 1 Simple Reliable Messaging Approach described
in my earlier email.

As said earlier, if you have multiple hops then this doesn't always work
since each hop could involve a transport protocol that works at a different
speed. For example the first hop could be HTTP and the next SMTP. What makes
it worse, is that the sender of the original message might not even know
that there is another hop and that it is going over SMTP.

CALCULATING THE TIMEOUT
-----------------------
The simplest way around this problem is to just do an end-to-end ack BUT
base the value of the "timeout" to use on when the message expires as
indicated by the "expires at". So, for example, if the original message had
to be delivered within 1 hour, you might want to do a retry if the
acknowledgement message had not been received within 10 minutes giving 6
attempts to deliver the message before "giving up".

If you think about, the "expires at" HAS to be set to a value that allows
sufficient time for the original message to be sent and the acknowledgement
to be received for the message to be sent with any success at all. So this
approach is pretty safe.

EARLY DETECTION OF NODE FAILURE
-------------------------------
However, what this approach does not do is provide early warning that an
intermediate node has failed. For example if you are sending the message by
HTTP, then you will not know, using the timeout value described earlier,
that transportation of the message was impossible until you had exhausted
all the 6 retries.

IF (note the capitals) this is a problem then the solution is to use
intermediate acknowledgement messages in addition to the end-to-end ack
where the intermediate ack is targeted (e.g. using the SOAP Actor) at the
"Next" node in the network.

INTERMEDIATE ACKS
-----------------
Since the intermediate ack is targeted at the next node, you usually can use
a timeout value that is based on the speed of the transport protocol. For
example, if you are using HTTP, then you might want to retry after 1 minute.
If you are using SMTP, then maybe every 30 minutes. Then work out a
reasonable number of retries to do before "giving up" say 3 or 4.

If you do this then you would know, again given the timeout values used
above, that the message could not be sent after a much shorter time, e.g.
3-4 minutes for HTTP.

However, the complexity does not end there. It's not much use if an
intermediate node has detected that it cannot send the original message any
further. The sender of the original message needs to know. This means tha
the intermediate node that detected the delivery failure HAS to send another
message back to the sender of the original message to notify them of the
failure.

So what intermediate acks really provide is an optimisation that provides
much earlier warning of delivery failure - albeit at a fairly large increase
in complexity.

Also remember that if End-to-End acks are used as well as the intermediate
acks then, if Level 3 - Reliable Messaging with Recovery, is used, it is
quite possible that it could cause another retry at sending the message some
time later.

OTHER FACTORS TO CONSIDER
-------------------------
However there are some issues with using intermediate acks which you need to
think about:
1. You can get two acks not one - you can now receive more than one ack, the
intermediate ack and the end-to-end ack which adds to the complexity
2. The Intermediate Ack *should* also be targeted at the next node in the
network otherwise the sender of the original node would get an ack back from
each and every intermediate node in the path
3. What do you do if the delivery failure message does not get through?
Should you send the delivery failure message reliably ... and what do you do
if that doesn't work either.
4. If the nodes in the network that are being targeted are SOAP nodes, then
there might be other, non SOAP systems between the SOAP nodes that use
different transport protocols at different speeds to connect. This makes
calculating the timeout value for intermediate acks harder to do reliably.

ULTIMATE RECEIVER
-----------------
The other issue with end-to-end acks is knowing what actually is the end. If
the last SOAP node, for example, in the path actually consumes the
payload/body of the SOAP message then there isn't much of a problem. But if
it does not, then the message could be lost after the SOAP node has passed
on the message to the application that actually processes it. This is why
the Acknowledgement Message described in the original email in this thread
can optionally provide additional information on the validation and passing
off of the message to the application.

BOTTOM LINE
-----------
I think that doing end-to-end acks where the timeout is based on the expiry
time of the message is the best approach to use. Doing intermediate acks,
only provides early warning of a problem, it isn't a substitute for
end-to-end acks. So I would recommend that intermediate acks are left out of
scope ... if this ever becomes an activity.

Thoughts?

David




-----Original Message-----
From: Ugo Corda [mailto:UCorda@SeeBeyond.com]
Sent: Thursday, December 12, 2002 5:56 PM
To: Burdett, David; www-ws-arch@w3.org
Subject: RE: Different Levels of Reliable Messaging


David,

I am glad you mention the case of intermediaries, because I have been
thinking for a while about how that affects Reliable Messaging.

In the case of security, what is usually said in order to justify the need
to address it at the SOAP headers level, instead of just using HTTPS, is
that you need that level if you want to do end-to-end security. Otherwise
intermediate nodes would need to have access to security sensitive
information related to the message in order to relay information from hop to
hop, and those nodes might not be authorized to do that.

But in the case of reliable messaging, it seems that you should be able to
use, for example, SOAP over HTTPR on one hop, and SOAP over JMS on the next
hop, and still be able to support reliable messaging end-to-end. (The
message goes reliably from A to C iff it goes reliably from A to B and from
B to C - for example, B waits until it gets the transport-level ack from C
before sending its transport-level ack to A). 

In fact, I think this was the rationale when IBM designed HTTPR, so that you
could go from Internet to intranet (and vice versa), using SOAP over HTTPR
on the Internet, and then switching to SOAP over MQSeries (or other MOM)
once inside the intranet. 

Your previous message seems to imply that this approach would not be
sufficient for end-to-end reliable messaging. Could you please elaborate?

Thank you,
Ugo

Received on Friday, 13 December 2002 00:46:53 UTC