Reliability is really two-phase (was RE: Reliable Web Services)

The reliability requirement really means that you need 
the sort of mechanisms and exchanges of two-phase outcome
(as in OASIS BTP).  "reliable messaging", depending on the
details of its mechanisms, is variously giving less that it
seems, or is just as complicated (and, in some cases, both).


To expand that assertion a bit:

a) i'm assuming reliability can be defined as two parties needing to have
a consistent view as to whether some work has or has not been done
by one of them at the request of the other
  [ this is the 0 or 1 case, and is the centre of state alignment -
  where I change my view of the shared state because I know you have/will]
  

b) the critical feature is that one side accepts
that the other side will make the definitive determination as
to whether the work is to be done; the deferring side
agrees to accept/apply/follow that determination once it knows of it

 [ which is the essence of the solution to the two armies problem - their
problem was that neither side will make an unconditional decision, but
wants the other side to make an irrevocable decision as a condition of
its own]

c) once the determination has been made, the repetition and recovery
rules of the transaction protocol make sure the other side will
know eventually 

d) you normally want to know that the application has really done
the work. In some cases, it may be sufficient to know that 
the work will eventually be done (e.g. it's been dropped on a
reliable queue) - but that means that either there is no
comeback or any comeback is a whole new activity.

e) the "simple" ack approach actually requires some extra
messages to avoid one or both sides having to remember the
request (or some identification on it) indefinitely or have
a complicated set of timeout rules as to when they can forget
things. (and that's before we worry about surviving crashes)

f) reliable messaging (including things like HTTPR) are 
distinguished from two-phase outcome only by what is counted
as the "decision" - it's "message received", not "work is/will be done".
The systems have to store similar information/identifiers
and follow similar rules as to when to persist and 
delete this information. [ in other words, it's not really simpler
to just use reliable messaging ]

g) some of the scenarios differ from the classic
two-phase commit exchanges in that the sender of the first
message is the one that defers to the other side's decision.
(classic two-phase is client asks server to defer to the 
client's decision). This has some impact on how the 
relationship gets established, but doesn't significantly 
affect what happens later (in terms of retries, persistence,
recovery sequences).

h) expel from your mind any assumptions about how the party
that is waiting on the other's determination/decision is 
holding itself able to obey. (two-phase commit does *not*
imply two-phase locking). It may hold the information in 
a distinguished interim state (outbound buffer, uncleared funds,
marked as reserved). It may completely perform its work and
retain a means of un-performing it. It may just check it could
perform its work and remember what it must do.

i) the transaction mechanisms actually allow for more complex
arrangements - the coordination role can be distinguished from
the resource-holding parties on each side, and there can be
more than two such parties. But for comparison with reliable
messaging, we can consider all the roles to be on one side or 
the other, and consider only a single bilateral relationship.

j) using a loosely-coupled transaction mechanism like BTP means
the application code doesn't have to get tangled up in the recovery,
repeats etc. Setting of timeouts and the like becomes a 
configuration question (possibly even a dynamic configuration
question if you really want to).

k) a two-phase outcome exchange doesn't really seem to count as
"orchestration" or "choreography" as I understand those. It's 
just a matter "please do this", "I can do this", "I can't do this" etc.
Any compensation/counter-operation/reversal is delegated to the
party that has to do the reversal, rather than having to be 
explicitly exposed as a counter-operation distinctly accessed
by the other side.


That's enough for now - I'm probably still obscure through
brevity, but the message is long enough already. 

Peter

------------------------------------------
Peter Furniss
Chief Scientist, Choreology Ltd

   Cohesions 1.0 (TM)
   Business transaction management software for application coordination

web: http://www.choreology.com
email:  peter.furniss@choreology.com
phone:  +44 20 7670 1679
direct: +44 20 7670 1783
mobile: +44 7951 536168
13 Austin Friars, London EC2N 2JX 

Received on Tuesday, 17 December 2002 21:21:45 UTC