RE: Different Levels of Reliable Messaging from Assaf Arkin on 2002-12-13 (www-ws-arch@w3.org from December 2002)

From: Assaf Arkin <arkin@intalio.com>
Date: Thu, 12 Dec 2002 20:44:22 -0800
To: "Ricky Ho" <riho@cisco.com>, "Burdett, David" <david.burdett@commerceone.com>, <www-ws-arch@w3.org>
Message-ID: <IGEJLEPAJBPHKACOOKHNOEMPCOAA.arkin@intalio.com>
Ricky,

I don't think it makes sense to use asynchronous messaging between two
services unless these services perform operations that are independent in
time. If I absolutely need to know you got the message now, or get an
immediate response, I would use a synchronous operations.

On the other hand, if I need to send you a message but don't care exactly
when you process it (within a limit, of course), I would use asynchronous
messaging. It could speed things up on my side by queuing outgoing messages
and sending them later on when I stop overloading the network with so much
synchronous messaging. It could speed things on your side because you can
process messages from different services over a longer period of time.

I would say that if a message has to absolutely get there within five
minutes or less, or has to be processed within five minutes or less, then
either use synchronous operations, or use a faster transport protocol and
transport layer that takes care of all the details, most probably by using
TCP/IP or UDP.

If you do allow for some latency, then you only need to care that the clocks
are synchronized to some extent, worst case they are off by a minute or two.
Such a level of synchronization is quite practical and we can expect
services to be able to conform.

In fact, if your clock is way off from everybody else's clock there are many
things that would go wrong. You could try to buy stock hours after the
market closes, or you could be doing Tuesday's transactions on Wedensday. If
your clock is way off then reliable messaging is the least of your worries.

Reliable messaging assumes message loss (whether lost or just delayed) but
does not assume significant message loss. If a significant portion of the
messages are lost then you're wasting a lot of time chasing down messages
that never arrive and no level of reliability is going to make your service
meet your expectation.

You probably expect 95%-99.999% of the messages to get there on time, and
you want to use reliable messaging to make sure 100% of the messages get
processed and not be in limbo even for 0.001% of the cases.

When things do go wrong, and that should be the exception not the norm, you
should resort to synchronous messaging to resolve that situation.

If you send me a message that expires in ten minutes (your clock), I receive
it eight minutes later and still believe I have four minutes to process it
(my clock is slightly off), or I'm just to busy to process it, then you
could use a synchronous operation to resolve that situation.

You could send me a message inquiring about the message I'm supposed to have
received, you could also ask me to ignore it if I did not already start
processing it, and I could give you a prompt reply.

If you can't communicate with me because I'm offline, you could try again
two minutes later. If I'm offline then, then even if my clock is way off
your clock, the message would expire when I go back online and I would
discard it.

arkin


> -----Original Message-----
> From: www-ws-arch-request@w3.org [mailto:www-ws-arch-request@w3.org]On
> Behalf Of Ricky Ho
> Sent: Thursday, December 12, 2002 6:11 PM
> To: Burdett, David; www-ws-arch@w3.org
> Subject: Re: Different Levels of Reliable Messaging
>
>
>
> Great summary David, some comments !
>
> The "ack" doesn't need to be per-message based.  I can send an ack for a
> bunch of message (of course, sequence number is used).
>
> The "time expiry" is unreliable because clocks may be unsync.
>
> I don't think there should be a step 4 in LEVEL 3.  Step 3 should
> say "Have
> you receive the message ?  If not, forget the message afterwards"
>
> I think LEVEL 5 should be done at the transaction layer, below
> choreography, but above reliable messaging.  Using some
> 2-phase-interaction
> style like BTP.
>
> Best regards,
> Ricky
>
> At 02:48 PM 12/12/2002 -0800, Burdett, David wrote:
>
> >I've been following the Reliable Messaging thread with interest
> and offer,
> >for the purposes of discussion, the following five levels of Reliable
> >Messaging starting from a simple "Acknowledgement Only" and ending with
> >"Reliable Processing" where each level offers gradually
> increasing "degrees
> >of reliability" ...
> >
> >LEVEL 0 - Acknowledgment only
> >-----------------------------
> >Upon request, an acknowledgment message is returned as a response to the
> >sending of a message. The minimum semantic of the
> acknowledgement message is
> >that the original message has been received and persisted and therefore,
> >barring catastrophes, it should not be lost and therefore will
> be processed.
> >The acknowledgement message can *optionally* return the
> following additional
> >status information:
> >a) The message structure is valid (in a SOAP context this could be split
> >into validation of the envelope, header, body and/or any attachments)
> >b) All the checks in a) plus checking that the content of the message is
> >valid, e.g. data, codes and identifiers in the message have been
> checked for
> >validity against their datatypes and/or reference information - e.g.
> >databases
> >c) Either or a) or b) above and the fact that the message has
> been passed on
> >for processing - e.g. to the application
> >
> >LEVEL 1 - Simple Reliable Messaging
> >-----------------------------------
> >This is based on Level 0 (Acknowledgment Only) with the following
> >extensions:
> >1. Each "original" message that is sent contains an "expires at"
> time which
> >indicates to the destination that, if they receive the message after this
> >point in time, they MUST NOT process it.
> >2. If the acknowledgement message is not received by the sender,
> after some
> >period of time then the original message is resent
> >3. Step 2 is repeated as required until an acknowledgement has
> been received
> >or the "expires at" times has passed. If no acknowledgment was
> received, the
> >sender gives up and *presumes* that the message was not delivered
> >4. The receiver of the message looks for duplicate messages and,
> if one is
> >found, does not "process" it but, instead, resends the acknowledgement
> >message
> >5. If the destination receives a message they have not seen
> before where the
> >"expires at" time has passed, then they reject the message with an error.
> >
> >Note that this does not solve the "two army" problem that has
> been discussed
> >earlier in this thread - but see Level 3 (Reliable Messaging
> with Recovery)
> >below.
> >
> >LEVEL 2 - Connection based Reliable Messaging
> >---------------------------------------------
> >This is based on either Level 0 (Acknowledgement Only) or 1
> (Simple Reliable
> >Messaging) and involves the sending of an inquiry to the
> destination of the
> >message **before sending the actual message** to determine the
> availability
> >of the service that is accepting messages at the destination, i.e.  is
> >running or not and is it able to accept messages.
> >
> >The idea is that if you do a successful inquiry and immediately
> follow it by
> >sending the actual message then effectively you have "set up a
> connection"
> >and so you are very likely to realize success. This could also be very
> >useful if you are sending a "large" message.
> >
> >LEVEL 3 - Reliable Messaging with Recovery
> >------------------------------------------
> >This is based on Level 1 (Simple Reliable Messaging), reuses the service
> >availability inquiry from Level 2 (Connection Based Reliable
> Messaging) and
> >adds an inquiry on Message Status. It works as follows.
> >1. Firstly the sender of the original must have "given up" (see level 1),
> >then, some time later,
> >2. The sender optionally uses the service availability inquiry
> from Level 2
> >to inquire on the current status of the service that was the
> destination of
> >the original message
> >3. The sender then determines the status of the original message that was
> >sent by doing a Message Status Inquiry targeted at the destination. In
> >return the destination sends another message that indicates:
> >   a) There was no record of the original message, or
> >   b) The original message was received and so resends the
> acknowledgement,
> >together with a status that indicates that processing is either: not
> >started, in progress, complete or not known
> >3. Depending on the response, the sender can take one of the following
> >actions:
> >   a) Resend the original message (or perhaps a new version of it as the
> >original might have expired),
> >   a) Cancel the original message - i.e. do not process it, or
> >   b) Wait for the response to the message to arrive (see also
> Level 5 below)
> >
> >Note that a status on the inquiry response of "not known" is
> valid since the
> >solution at the destination that is providing reliable messaging
> support may
> >have no way of determining the status of the processing of the message as
> >the processing is being carried out by another piece of software
> that cannot
> >provide that information.
> >
> >LEVEL 4 - Connection based Reliable Messaging with Recovery
> >-----------------------------------------------------------
> >This is a simple combination of levels 2 and 3 where a query on the
> >availability of the service is done first as in Level 3, but, if the
> >acknowledgement was not received and the sender "gave up", then a Level 4
> >Recovery is attempted as well.
> >
> >LEVEL 5 - Reliable Processing
> >-----------------------------
> >Personally I don't think that this level should be part of Reliable
> >Messaging and should be part of Choreography instead. I am
> including it in
> >this email for completeness and so that we can determine that it
> is out of
> >scope. Anyway, here's the description ...
> >
> >All the previous "reliable messaging" approaches are concerned with the
> >delivery of a single message. However, often a message is sent
> as part of a
> >larger (and longer) sequence of exchanges (i.e. a choreography).
> An example
> >use case could be where a buyer sends an Order to a Seller. Later, the
> >Seller should return an Order Response which indicates the
> extent to which
> >the seller can (or can not) satisfy the order.
> >
> >Now the Order could be sent reliably using one of the levels described
> >above. Similarly the Order Response could be sent reliably. But
> suppose the
> >Order Response does not come when expected - none of the earlier
> "reliable
> >messaging levels" help. This will most likely be due to some processing
> >error at the Seller where the original message was lost.
> >
> >To handle this you could go the further, final step to support "Reliable
> >Processing" which includes the following additional steps:
> >1. The sender of the original message determines when the
> response message
> >(e.g. the Order Response and NOT the RM Acknowledgement) should
> be received.
> >2. If the response message is not received by the anticipated time:
> >   a) The sender inquires on the status of the *processing* of
> the original
> >message (i.e. not just its delivery). The response to the inquiry will
> >indicate either:
> >     - the message was never received (even though it might have
> been sent
> >reliably!)
> >     - the message was received and its processing is either:
> not started, in
> >progress, or complete
> >   b) Depending on the response the sender can either:
> >     - cancel the original message (e.g. it processing had not
> been started)
> >     - wait for processing to complete, or
> >     - request the response to the message to be resent.
> >
> >Thoughts?
> >
> >David
> >--
> >Director, Product Management, Web Services
> >Commerce One
> >4440 Rosewood Drive, Pleasanton, CA 94588, USA
> >Tel/VMail: +1 (925) 520 4422; Cell: +1 (925) 216 7704
> >mailto:david.burdett@commerceone.com; Web: http://www.commerceone.com
>
Received on Thursday, 12 December 2002 23:45:12 UTC