RE: Reliability is really two-phase (was RE: Reliable Web Services) from Assaf Arkin on 2002-12-24 (www-ws-arch@w3.org from December 2002)

From: Assaf Arkin <arkin@intalio.com>
Date: Mon, 23 Dec 2002 18:10:32 -0800
To: "Walden Mathews" <waldenm@optonline.net>, "Mark Potts" <mark.potts@talkingblocks.com>, "Peter Furniss" <peter.furniss@choreology.com>, "Patil, Sanjaykumar" <sanjay.patil@iona.com>
Cc: "Www-Ws-Arch" <www-ws-arch@w3.org>
Message-ID: <IGEJLEPAJBPHKACOOKHNCEHKCPAA.arkin@intalio.com>

> > There's an important principle here: you reduce the cost of development
> and
> > improve reliability if you simplify the application by putting the
> > complexity in separate layers. So for an RM (RM=MOM) you would
> automatically
> > do the resend without bothering the application.
>
> How many times would you resend?  How long would you wait
> between resends?  What would you tell your application while this
> process is going on?  Would you pend your application during this
> process?  What if your application can't afford to be pended awaiting
> "reliability"?

Good questions.

If you need instanteneous response than you would use a service that
provides instanteneous response. You would typically use a synchronous
communication protocol to expedite back & forth communication. If you can
tolerate waiting for a response for an amount of time that is longer than
the latency of the protocol, then you would consider using asynchronous
messaging. If you use asynchronous messaging, then you may want to use an
RM.

What you have here are two different timeouts. Let's say that X is the
amount of time you want to receive a response from the other service, and Y
is the maximum latency for getting a request to the service (and an ack back
to the sender). You set Y to be significantly smaller than Y, and that
allows the RM to speed things up depending if you need fast response, or
take it easy if you can accept a slower response time.

For example the maximum time to respond to a purchase order request (X)
could be 24 hours, and the maximum time to acknowledge a purchase order
request (Y) would be 4 hours. Let's say that we deem 3 sends as sufficient
to give us 99.9% reliability. Then the RM would schedule up to three sends
within 4 hours time frame, give up after 4 hours. The application gives up
after 24 hours (if it gets ack but no response), so it's never waiting for
the RM to resend.

Everything is settable. You can determine what the resend policy is, how
often to try, what interval, how to escalate, etc. These are all
implementation details, they depend on the RM you use.


> What happens when an application uses an RM framework in order
> to reduce its complexity, observes and interprets the signs from the
> RM indicating that messaging was reliable, then discovers that the peer
> application at the other end is in some unexpected state despite
> assertions of "reliability"?  Is this impossible?

This could happen even if you use a synchronous protocol with 100% delivery
guarantee and you know without a doubt that the message was recieved, but
the reciever has some software glitch (issue? feature?) that causes it to
enter this unexpected state.

Since the same solution applies in either case, it is better if we solve it
in a separate layer. You can join the discussion about coordination
protocols going on in a separate thread and debates these points.

arkin

>
> Thanks,
>
> Walden Mathews
>
>

Received on Monday, 23 December 2002 21:12:06 UTC