Re: Proposed text on reliability in the web services architecture from Walden Mathews on 2003-01-16 (www-ws-arch@w3.org from January 2003)

From: Walden Mathews <waldenm@optonline.net>
Date: Thu, 16 Jan 2003 11:29:21 -0500
To: Assaf Arkin <arkin@intalio.com>, Peter Furniss <peter.furniss@choreology.com>, "Champion, Mike" <Mike.Champion@SoftwareAG-USA.com>, www-ws-arch@w3.org
Message-id: <003401c2bd7c$6d8321e0$1702a8c0@WorkGroup>
> > Deposit is, by definition, a non-idempotent operation.  Now, where
> > do you want to go with that?
>
> Pointing out that forcing non-idempotent operations to become idempotent
is
> not always practical. Did I make that point even if I was totally
redundant
> in repeating everything that was said?

It's clear now, and we agree on it.

>
>
> > If the deposit itself is a resource of interest, then you can use
> > idempotence to set the amount of the deposit.  This is one way
> > to reconcile the account.
>
> That's just shifting the problem away. In fact that's more precisely how
it
> works. I've talked about it before with the check example. You cash a
check
> and that's idempotent because you can create that record any number of
> times. But it's just a triger that starts a non-idempotent operation. So A
> is doing idempotent operation causing B to do non-idempotent operation. In
> other words, if it's non-idempotent it's non-idempotent you can mask it
but
> you can't make it disappear.

It's not shifting away if the client cares about the identity of the
deposit.
That's my point.  The deposit identity is a meaningful thing; the identity
of some message carrying the deposit is not.  If anything, this is
"shifting"
the concern back where it belongs.  Shifting it away from RM, I
suppose, to a place where it is more tractable.

>
> > If the deposit has no lasting value to you, and all you care about
> > is setting the account balance to the "right" figure (Quicken has a
> > feature like this for people who like to "try" to balance their
accounts,
> > but don't want to put too much into the effort), then you could
> > idempotently set the balance, overriding deposit-based calculations.
>
> I hope Quicken 2003 would let me do that to my bank account ;-)

I hope not, as it might lead to your conviction.

>
> > In either case, an alternate but unattractive strategy would be to
> > issue increments against the amount in question until the right
> > value was converged upon.  This is the problem RM is focused
> > on, and that's the reason RM is the hard way to do what the
> > application really needs.
>
> I don't really see why RM would want to do that or tries to do that. I
only
> percieve RM as trying to address message loss and while you can do other
> things with it. If it's intended to solve one specific problem I feel that
> it should just be used to solve that specific problem.

True that RM plays no part in selecting an incremental strategy for
setting end-state.  What I'm saying is that RM can bolster such an
approach toward one definition of reliability, and that a different approch
in the application can attack reliability end-to-end, more effectively.

Do we agree that HTTP over TCP/IP already has RM incorporated?

>
> > > So the end-state is a composition of the new balance and the
> > journal which
> > > now contains a new record. It's not practical to send the full
> > end-state.
> >
> > No, and it shouldn't be necessary. The point is that if you care about
the
> > entries in the journal, then you give them identity and acknowledge the
> > sovereignty of their state, which gives you a platform from which to use
> > an idempotent operation in that scope.
>
> The server cares about the entries in the journal. The client doesn't. Two
> servers may have two different ways to use the journal. You want to
decouple
> the client from the server so you could use either server equally as well.
> Think of it like checks, banks and the tapes on which they record all
> transactions. You want to use a check and you want to not care how they
> manage to record the transaction when the check is cashed.

I disagree about this case.  Clients recognize audit as part of their
scope, and the journal is a tool in that, hence its visibility.  Moreover,
the client maintains its own version of the journal.  You can decouple
implementation details of client and server, but in this case, the journal
is not an implementation detail.

>
>
> > Version numbers are like RM.  They're not what you really care about,
> > and they carry the potential to screw you despite your good intentions.
> > What happens to your versioned account when a deposit that's been lost
> > in the network for a week finally comes home?
>
> Nothing. Go back to the previous e-mails and see how the algorithm is
> described. Version numbers have to be increasing but not consecutive, if
you
> missed it you simply retry the operation with a new version number. This
> algorithm is used to build fault tolerant solutions, so I think it would
go
> a good job of addressing many "what if" scenarios.

Maybe I misunderstood you.  I thought you were versioning the
repository (the account), but actually you're versioning the messages?
If you're trying the same operation but with a higher version number,
where is "sameness" recorded?  Versioning seems irrelevant.  Maybe
I still don't understand.


> Ages ago someone made a point in favor of idempotent/query only
operations.
> I tried to prove that it's not always in the best interest of the
> application. I'm glad both of us agree on that.

Ditto.

> > Maybe yes; maybe no.  If that was the deposit of my paycheck and it got
> > lost, then I'm not fine.  Not all problems are modeled well by
statistics.
>
> Let's say that banks start admitting their mistakes and pay the customer
$50
> every time they miss a deposit. They'll probably be looking at decreasing
> their error rate. But they won't be looking at achieving 0% failure
because
> the cost would be higher than what it costs to pay a penalty and lose a
> customer. At the end it's all about cost/benefit which does take
statistics
> into account.

This depends on the user's (bank or customer) requirement, which will
vary from user to user, and cannot always be covered by a statistical
approach to reliability.  Hence my statement above that "not all problems
are modeled well by statistics".  Some aren't, and still have to be
accounted for.  RM alone is not a good fit for those, would you agree?

> > > If you are sending a message and do not immediately know whether the
> > message
> > > has been sent, and the success ration is 90% (e.g. when you use IP
> > multicast
> > > or UDP or SMTP) then you need to resolve 10% of the time.
> >
> > Yes, but WHICH 10%?  Which messages were received and which
> > were not?  How do we find out?
>
> There's one or two of may five startegies I could think of.

My point is that you need to do something 100% of the time, even if
only 10% of messages is lost.  Job 1 is identifying the lost messages, which
is O(n), where n = number of messages.  No?

>
>
> > > You can reduce that to 0.01% (or something like that, my numbers are
all
> > > made up) if you have some messaging layer that makes the messages
> > idempotent
> > > and keep resending until it gets close to 100% of success rate.
> >
> > You're dreaming.
>
> Perhaps.
>
> Here I am running a test sending one million messages over UDP in a tight
> loop. I over saturate the connection so I get 90% packet loss at the UDP
> level. I have a 5 second timeout so messages not receive within 5 seconds
> are considered lost. I have reliable messaging with resend. All one
million
> messages arrive at their destination.
>
> But it must be a dream. Let me pinch my self. Ouch!

I'm sure your experiment was real.  The dream I'm talking about is the
one in which this somehow relieves the application of its burden.  Perhaps
you don't think it does.

Walden
Received on Thursday, 16 January 2003 11:29:25 UTC