RE: Reliability is really two-phase (was RE: Reliable Web Services) from Assaf Arkin on 2002-12-24 (www-ws-arch@w3.org from December 2002)

From: Assaf Arkin <arkin@intalio.com>
Date: Tue, 24 Dec 2002 12:21:08 -0800
To: "Walden Mathews" <waldenm@optonline.net>, "Mark Potts" <mark.potts@talkingblocks.com>, "Peter Furniss" <peter.furniss@choreology.com>, "Patil, Sanjaykumar" <sanjay.patil@iona.com>
Cc: "Www-Ws-Arch" <www-ws-arch@w3.org>
Message-ID: <IGEJLEPAJBPHKACOOKHNKEIACPAA.arkin@intalio.com>
Your software vendor gives you two types of support services.

If you have a question that was anticipated, or someone asked before, you
can find it in the documentation, faq, or tech note. You can get
instantenous response by using a search engine.

Instanteneous means immediate, without pause or delay. It doesn't mean 0
response time, you can't even get a message to the network card without some
response time. If you need zero response time you will need to be able to
predict the response. Quantum computing may let you operate faster than the
speed of light, but the technology is not yet available for large scale
deployment. So for now let's assume an HTTP request/response is as fast as
you can get.

If you have a question that is specific to your installation, you
encountered a bug, or the response is not documented yet, someone has to
cater for it. That means someone has to receive the request, do some
thinking, and give you back a response. Even if the expert is sitting there
in front of the computer not doing anything else, it's probably going to
take them a few minutes to try and figure out what's going wrong. Maybe they
have no clue, then need to go and talk to someone before they can come back
to you.

As a vendor, I want my support stuff to give a response time that is faster
than the speed of light, but even though I'm busy working on that solution,
it's going to take a few years before we roll it to the market. Right now we
have support cycle that is faster than the competition, but even though they
are fast, they still take time to figure out exactly what the problem is and
how to response to it.

Let's say that looking at a problem and coming back with a response takes 4
hours (it's a very tricky problem and so the solution is not evident). You
can call a support person and wait on the phone 4 hours for a response. I
assume you're a busy person, you don't want to sit there and wait for 4
hours until they figure out how to solve it. A better option is for you to
call with the request, made sure it got logged, then go and do something
else.

You can keep calling every 30 minutes (poll) or you can wait for the
response person to call you back when they figured out the response
(interrupted). You will agree with me that being interrupted is more
efficient use of your time than polling. Similarly, when you build software
you look for infrastructure solutions (like MOM) that let you do that.

Let's say they call you back but you can't pick up the phone (you just
entered the Lincoln tunnel). They will try again five minutes later and
again. So by retrying at frequent intervals they increase the liklihood that
you will get the response in a timely manner.

Does that give you a good picture of where asynchronous/RM gets to be used?


> The same solution doesn't apply in either case.  A soft-realtime
> application
> may decide, after waiting X milliseconds for an acknowledgement,
> the the business value of that ack has reached zero, based on new
> information received by the same application.  The "separate layer"
> has no knowledge of that, and so cannot participate, let alone "solve"
> the problem.

The ack has no business value at all nor is it delivered to the application.
It is part of a positive-ack protocol between the RM that is there to
expedite the delivery of messages when message loss occurs. Redundant acks
have no affect on the behavior of the application.

The separate thread has the same subject as this one, it's just branching
off into a discussion of coordination protocols.

arkin


> -----Original Message-----
> From: www-ws-arch-request@w3.org [mailto:www-ws-arch-request@w3.org]On
> Behalf Of Walden Mathews
> Sent: Tuesday, December 24, 2002 10:19 AM
> To: Assaf Arkin; Mark Potts; Peter Furniss; Patil, Sanjaykumar
> Cc: Www-Ws-Arch
> Subject: Re: Reliability is really two-phase (was RE: Reliable Web
> Services)
>
>
>
> > Good questions.
> >
> > If you need instanteneous response than you would use a service that
> > provides instanteneous response.
>
> Do you mean a service or a messaging system here?  If you mean service,
> there's no such thing as a service that provides instantaneous response,
> and even if one purported to, reality stacks the decks against it.  maybe
> I misunderstand.
>
> > You would typically use a synchronous
> > communication protocol to expedite back & forth communication.
>
> How does waiting for a response expedite it?  Usually when I wait for
> things, it doesn't make them happen faster.  Do I understand you?
>
> > If you can
> > tolerate waiting for a response for an amount of time that is
> longer than
> > the latency of the protocol, then you would consider using asynchronous
> > messaging.
>
> Is this a real "if"?  If I *can't* tolerate the propagation times of the
> medium,
> should I be using its bandwidth at all?
>
> If I use "asynchronous messaging", then I either need an application that
> polls, or I need an application designed to be interruptable.  Either of
> these infects the application with networking complexities.  I thought the
> goal of this RM layer was to keep that out of the app.  Maybe I don't
> understand...
>
> > If you use asynchronous messaging, then you may want to use an
> > RM.
>
> That's what I thought we were talking about.  Using an RM.
>
> >
> > What you have here are two different timeouts. Let's say that X is the
> > amount of time you want to receive a response from the other
> service, and
> Y
> > is the maximum latency for getting a request to the service (and an ack
> back
> > to the sender). You set Y to be significantly smaller than Y, and that
> > allows the RM to speed things up depending if you need fast response, or
> > take it easy if you can accept a slower response time.
>
> This doesn't fit my experience very well.  Usually X is some limit
> approaching
> zero.  Who wants to wait longer than absolutely necessary?  Why would I
> arbitrarily say "you can get back to me four hours from now"?  It's not
> competitive.
>
> As for Y being maximum latency, maximum latency is always infinity as
> far as I know.  Did you mean minimum latency?  It sounds oveall
> as if you're
> designing features for patient users.  I wonder if they'd get any use.
> But I'm not sure I understand.
>
> >
> > For example the maximum time to respond to a purchase order request (X)
> > could be 24 hours, and the maximum time to acknowledge a purchase order
> > request (Y) would be 4 hours. Let's say that we deem 3 sends as
> sufficient
> > to give us 99.9% reliability. Then the RM would schedule up to
> three sends
> > within 4 hours time frame, give up after 4 hours. The
> application gives up
> > after 24 hours (if it gets ack but no response), so it's never
> waiting for
> > the RM to resend.
>
> I can't think of an application that would be interested in using
> a service
> like that.  Maybe my NYC perspective gives itself away, but this is I-age,
> Information, Impatience.  I want service now.
>
> >
> > Everything is settable. You can determine what the resend policy is, how
> > often to try, what interval, how to escalate, etc. These are all
> > implementation details, they depend on the RM you use.
>
> Oh, declarative reliability instead of procedural.  Okay, maybe that
> flies.  Sometimes, though, I change my mind about my real-time
> requirements
> mid-transaction...
>
> I don't think these things are *implementation detail* at all!
> If I have to
> configure them as a client, then they're part of my interface. If
> I have to
> program to accept an interrupt from an asynchronous RM framework,
> then once again the constraint is right in the interface.
>
> > > What happens when an application uses an RM framework in order
> > > to reduce its complexity, observes and interprets the signs from the
> > > RM indicating that messaging was reliable, then discovers
> that the peer
> > > application at the other end is in some unexpected state despite
> > > assertions of "reliability"?  Is this impossible?
> >
> > This could happen even if you use a synchronous protocol with 100%
> delivery
> > guarantee and you know without a doubt that the message was
> recieved, but
> > the reciever has some software glitch (issue? feature?) that
> causes it to
> > enter this unexpected state.
>
> Precisely.  Restated, "reliability" at the application level is
> comprehensive,
> inclusive and overarching to "reliability" at the messaging level.  But
> since
> an application-level result is the only thing my application cares about,
> and since my application cannot avoid caring about that and responding
> to it, how does it help to "solve 99.9%" of it at a lower level?  In other
> words, the empiror was 99.9% clothed, but he was still naked.
>
> >
> > Since the same solution applies in either case, it is better if we solve
> it
> > in a separate layer. You can join the discussion about coordination
> > protocols going on in a separate thread and debates these points.
>
> The same solution doesn't apply in either case.  A soft-realtime
> application
> may decide, after waiting X milliseconds for an acknowledgement,
> the the business value of that ack has reached zero, based on new
> information received by the same application.  The "separate layer"
> has no knowledge of that, and so cannot participate, let alone "solve"
> the problem.
>
> Where is this other thread?
>
> Thanks,
> Walden
>
Received on Tuesday, 24 December 2002 15:23:10 UTC