RE: Different Levels of Reliable Messaging from Burdett, David on 2002-12-13 (www-ws-arch@w3.org from December 2002)

From: Burdett, David <david.burdett@commerceone.com>
Date: Thu, 12 Dec 2002 21:47:10 -0800
To: Ricky Ho <riho@cisco.com>, "Burdett, David" <david.burdett@commerceone.com>, www-ws-arch@w3.org
Message-ID: <C1E0143CD365A445A4417083BF6F42CC06F432E3@C1plenaexm07.commerceone.com>
Ricky

See comments below.

David

-----Original Message-----
From: Ricky Ho [mailto:riho@cisco.com]
Sent: Thursday, December 12, 2002 6:11 PM
To: Burdett, David; www-ws-arch@w3.org
Subject: Re: Different Levels of Reliable Messaging


Great summary David, some comments !

The "ack" doesn't need to be per-message based.  I can send an ack for a 
bunch of message (of course, sequence number is used).
<DB>Agreed, but now you are adding in an extra level of complexity (sequence
number) that often won't be needed. What I would suggest is that you split
this into another two levels:
1. Sequencing Support. This is a protocol, built on top of reliable
messaging that ensures that messages arrive in the sequence they were sent.
2. Reduced Frequency Acknowledgement Messages. You could then vary the
reliable messaging protocol so that a request for an acknowledgement is
every so many messages and if it is not received, then corrective action is
taken.
</DB>

The "time expiry" is unreliable because clocks may be unsync.
<DB>Absolutely right. 

The "cheap", but as you say inaccurate way to do this is to set and compare
"expires at" using a local system clock. The fact that it is an
approximation to the true time is often not a big issue especially if you
are doing end-to-end acks where the time between sending a message and when
it expires is long compared to the clock accuracy (e.g. a day). Even so, it
is probably good practice that Reliable Messaging solutions take this
uncertainty in the accuracy of the time into account and extend the "expires
at" to some time beyond the nominal expiry time.

If time accuracy is *fairly* critical, then the sender and receiver of a
message SHOULD agree to keep their clocks accurate using, for example,
protocols such as the Network Time Protocol. If accuracy is *really*
critical then you can include in the message the accuracy to which the
system at the destination MUST keep its clocks. If the system does not keep
its clocks accurate or cannot keep them accurate enough, then the
destination should reject the message and not process it.</DB>

I don't think there should be a step 4 in LEVEL 3.  Step 3 should say "Have 
you receive the message ?  If not, forget the message afterwards"
<DB>I don't think you can always say this. For example if you want to place
an order and there is only one supplier, then even if you message failed,
you might want to resend it if the connection became available later. In
this case, the conent/payload/body of the message might be identical but in
other ways it was a completely new message.</DB>

I think LEVEL 5 should be done at the transaction layer, below 
choreography, but above reliable messaging.  Using some 2-phase-interaction 
style like BTP.
<DB>Quite possibly. The problem with two phase commit is the action you take
when you geet a failure (i.e. a rollback) may not always the right one and
often it can be impossible to do. For example, if you want to roll back a
payment, but the payment has already gone to the bank, then its to late. You
have to do a reversal, or refund instead. Both of these would leave a trace
in the records of what happened.</DB>

Best regards,
Ricky

At 02:48 PM 12/12/2002 -0800, Burdett, David wrote:

>I've been following the Reliable Messaging thread with interest and offer,
>for the purposes of discussion, the following five levels of Reliable
>Messaging starting from a simple "Acknowledgement Only" and ending with
>"Reliable Processing" where each level offers gradually increasing "degrees
>of reliability" ...
>
>LEVEL 0 - Acknowledgment only
>-----------------------------
>Upon request, an acknowledgment message is returned as a response to the
>sending of a message. The minimum semantic of the acknowledgement message
is
>that the original message has been received and persisted and therefore,
>barring catastrophes, it should not be lost and therefore will be
processed.
>The acknowledgement message can *optionally* return the following
additional
>status information:
>a) The message structure is valid (in a SOAP context this could be split
>into validation of the envelope, header, body and/or any attachments)
>b) All the checks in a) plus checking that the content of the message is
>valid, e.g. data, codes and identifiers in the message have been checked
for
>validity against their datatypes and/or reference information - e.g.
>databases
>c) Either or a) or b) above and the fact that the message has been passed
on
>for processing - e.g. to the application
>
>LEVEL 1 - Simple Reliable Messaging
>-----------------------------------
>This is based on Level 0 (Acknowledgment Only) with the following
>extensions:
>1. Each "original" message that is sent contains an "expires at" time which
>indicates to the destination that, if they receive the message after this
>point in time, they MUST NOT process it.
>2. If the acknowledgement message is not received by the sender, after some
>period of time then the original message is resent
>3. Step 2 is repeated as required until an acknowledgement has been
received
>or the "expires at" times has passed. If no acknowledgment was received,
the
>sender gives up and *presumes* that the message was not delivered
>4. The receiver of the message looks for duplicate messages and, if one is
>found, does not "process" it but, instead, resends the acknowledgement
>message
>5. If the destination receives a message they have not seen before where
the
>"expires at" time has passed, then they reject the message with an error.
>
>Note that this does not solve the "two army" problem that has been
discussed
>earlier in this thread - but see Level 3 (Reliable Messaging with Recovery)
>below.
>
>LEVEL 2 - Connection based Reliable Messaging
>---------------------------------------------
>This is based on either Level 0 (Acknowledgement Only) or 1 (Simple
Reliable
>Messaging) and involves the sending of an inquiry to the destination of the
>message **before sending the actual message** to determine the availability
>of the service that is accepting messages at the destination, i.e.  is
>running or not and is it able to accept messages.
>
>The idea is that if you do a successful inquiry and immediately follow it
by
>sending the actual message then effectively you have "set up a connection"
>and so you are very likely to realize success. This could also be very
>useful if you are sending a "large" message.
>
>LEVEL 3 - Reliable Messaging with Recovery
>------------------------------------------
>This is based on Level 1 (Simple Reliable Messaging), reuses the service
>availability inquiry from Level 2 (Connection Based Reliable Messaging) and
>adds an inquiry on Message Status. It works as follows.
>1. Firstly the sender of the original must have "given up" (see level 1),
>then, some time later,
>2. The sender optionally uses the service availability inquiry from Level 2
>to inquire on the current status of the service that was the destination of
>the original message
>3. The sender then determines the status of the original message that was
>sent by doing a Message Status Inquiry targeted at the destination. In
>return the destination sends another message that indicates:
>   a) There was no record of the original message, or
>   b) The original message was received and so resends the acknowledgement,
>together with a status that indicates that processing is either: not
>started, in progress, complete or not known
>4. Depending on the response, the sender can take one of the following
>actions:
>   a) Resend the original message (or perhaps a new version of it as the
>original might have expired),
>   a) Cancel the original message - i.e. do not process it, or
>   b) Wait for the response to the message to arrive (see also Level 5
below)
>
>Note that a status on the inquiry response of "not known" is valid since
the
>solution at the destination that is providing reliable messaging support
may
>have no way of determining the status of the processing of the message as
>the processing is being carried out by another piece of software that
cannot
>provide that information.
>
>LEVEL 4 - Connection based Reliable Messaging with Recovery
>-----------------------------------------------------------
>This is a simple combination of levels 2 and 3 where a query on the
>availability of the service is done first as in Level 3, but, if the
>acknowledgement was not received and the sender "gave up", then a Level 4
>Recovery is attempted as well.
>
>LEVEL 5 - Reliable Processing
>-----------------------------
>Personally I don't think that this level should be part of Reliable
>Messaging and should be part of Choreography instead. I am including it in
>this email for completeness and so that we can determine that it is out of
>scope. Anyway, here's the description ...
>
>All the previous "reliable messaging" approaches are concerned with the
>delivery of a single message. However, often a message is sent as part of a
>larger (and longer) sequence of exchanges (i.e. a choreography). An example
>use case could be where a buyer sends an Order to a Seller. Later, the
>Seller should return an Order Response which indicates the extent to which
>the seller can (or can not) satisfy the order.
>
>Now the Order could be sent reliably using one of the levels described
>above. Similarly the Order Response could be sent reliably. But suppose the
>Order Response does not come when expected - none of the earlier "reliable
>messaging levels" help. This will most likely be due to some processing
>error at the Seller where the original message was lost.
>
>To handle this you could go the further, final step to support "Reliable
>Processing" which includes the following additional steps:
>1. The sender of the original message determines when the response message
>(e.g. the Order Response and NOT the RM Acknowledgement) should be
received.
>2. If the response message is not received by the anticipated time:
>   a) The sender inquires on the status of the *processing* of the original
>message (i.e. not just its delivery). The response to the inquiry will
>indicate either:
>     - the message was never received (even though it might have been sent
>reliably!)
>     - the message was received and its processing is either: not started,
in
>progress, or complete
>   b) Depending on the response the sender can either:
>     - cancel the original message (e.g. it processing had not been
started)
>     - wait for processing to complete, or
>     - request the response to the message to be resent.
>
>Thoughts?
>
>David
>--
>Director, Product Management, Web Services
>Commerce One
>4440 Rosewood Drive, Pleasanton, CA 94588, USA
>Tel/VMail: +1 (925) 520 4422; Cell: +1 (925) 216 7704
>mailto:david.burdett@commerceone.com; Web: http://www.commerceone.com
Received on Friday, 13 December 2002 00:47:02 UTC