RE: Reliability is really two-phase (was RE: Reliable Web Services) from Mark Potts on 2002-12-23 (www-ws-arch@w3.org from December 2002)

From: Mark Potts <mark.potts@talkingblocks.com>
Date: Mon, 23 Dec 2002 14:47:42 -0800
To: "Assaf Arkin" <arkin@intalio.com>, "Peter Furniss" <peter.furniss@choreology.com>, "Patil, Sanjaykumar" <sanjay.patil@iona.com>
Cc: "Www-Ws-Arch" <www-ws-arch@w3.org>
Message-ID: <6A852758A8437041A389D1AE74D8760A0FE0D3@mach5.talkingblocks.com>
I think this is getting away form the topic somewhat, but lets
continue..

> > I understand this, and agree, and the same applies to BTP 
> and PM being 
> > outside the application or functionality  layer - the layers in my 
> > mind get closer to the application layer as you get higher in the 
> > stack (transport->protocol->context->process->application).
> 
> There's an important principle here: you reduce the cost of 
> development and improve reliability if you simplify the 
> application by putting the complexity in separate layers. So 
> for an RM (RM=MOM) you would automatically do the resend 
> without bothering the application. And for BTP you would have 
> a generic layer that does the coordination. That way, through 
> simple extensions to my applications that require little 
> logic, I can get to benefit from coordinated transactions and 
> improve reliability.

I think we are in violent agreement!

> > Today the simplest case involves the receiver agreeing to 
> participate 
> > in the transaction (the transaction context, as part of the 
> > application message is sort of an invite to provider to 
> enlist parties 
> > under its control in the transaction context provided). The 
> receiver 
> > can do so in conjunction with the second message passed 
> from receiver 
> > back to sender, the message can also optionally have state 
> about the 
> > work requested - i.e. prepared statement if the work is completed 
> > (with whatever caveats it wants to impose). When done this way you 
> > still have two messages, outside the termination phase of 
> the message 
> > exchange.
> 
> So one flow would be:
> 
> 1. RM receives message sends ack
> 2. RM delivers one copy of message to application
> 3. application starts processing message
> 4. application completes processing message and sends back 
> response enlisting in transaction

In BTP this is the participant, it is a participant (inferior) that
enrolls  / enlists with the transaction coordinator (superior). The
participant itself can also be a Superior (in your example a local
coordinator) (these are roles within the protocol) with inferiors such
that transaction can be hierarchical structures and managed
consistently. The processing here might not be the capability to do the
work it may simply be I trust the sender and will agree to participate -
no commitment or guarantee is made yet - the protocol allows many
interaction patterns. 

> 5. application continues 
> doing further work in transaction context 

> 6. RM does delivery 
> on behalf of application with resend
> 
> In this flow the application is doing step 3 outside the 
> transaction context and has to explicitly enlist in the 
> transaction when it does step 4. Step 3 might have different 
> results (can't deliver, can deliver, need more time, can 
> deliver but not in that date), so there are actually multiple 
> pieces of code that do step 4 and each has to enlist in the 
> transaction.

Or as said, the inferior participant can be individual participants or a
coordinator of localized participants
> 
> Another flow would be:
> 
> 1. RM receives message sends ack
> 2. TM picks up message, recreates transaction context, 
> delivers to application 3. application starts processing 
> message and end back response 4. TM sends back response using 
> the proper transaction context 5. RM does delivery on behalf 
> of TM with resend

This is also the localized coordinator as a an inferior pattern.
> 
> In this flow the TM takes care of enlisting the application 
> so all the steps are done in a transaction context and the TM 
> can supervise the application, i.e. communicate internal 
> failures as rollbacks, or communicate rollbacks to the 
> application. So the application doesn't have to be aware it's 
> part of a transaction aside from having synchronization entry 
> points and allowing the TM manage all the workload.

Correct - this is a true interposition model, but there are other models
where portions of the work requested can go ahead and others not, or
where the portions of the transaction are spread across multiple
providers. Say for example the steps in the process include inventory
check, manufacturing, shipping and insurance. In this particular example
I may want to enroll all 4 participants in the transaction and not
shield them from the requestor, for example if the insurance is not
available the requestor could to continue with the order but arrange
insurance through a third party (open top coordination).
> 
> Since we're using X/Open XA, OTS, DTC and similar transaction 
> protocols we already have the infrastructure to do that, we 
> just use legacy protocols that are not supported over 
> HTTP/SOAP, but it's given that we can easily extend it to do BTP.

BTP is the protocol for negotiating commitment to work requested and
undertaken by cooperating parties  - it does not define how that
commitment is managed (XA, Open/XA etc) and can equally employ
compensation mechanisms to meet the commitments made between parties
using the BTP protocol. As Peter remarked the coordination protocol does
not imply resource locking or dependency of any transaction model.

> In this model the application is always enlisted in the 
> transaction when it receives a message as part of the 
> transaction, which doesn't commit the application to anything 
> more than using a TM for processing all its messages. The 
> application is enlisted in the transaction whether or not it 
> communicates that to the sender, so this approach also works 
> with the flow you suggested.

True  - I'm still of the opinion that you want to know who is willing to
"play" if this is dynamic and agreement to participate in these types of
transactions has not been negotiated out of band.
> 
> But since the sender knows whether the receiver supports the 
> transaction protocol or not, the sender can now unilaterally 
> enlist the receiver in the transaction when it first sends a 
> message to the receiver in the transaction context. This will 
> work because you have two TMs talking to each other and 
> doesn't complicate either application.

That's the whole point of BTP it allows a 2 pipe model or optimization
over a single pipe model and neither is intrusive to the application
functionality. You are inferring some a priori knowledge here between
the sender and receiver, which again goes to the point made above.
> 
> If the receiver application needs to use additional 
> resources, since the TM supervises its work it can elect to 
> either explicitly enlist them with the sender transaction, or 
> do transaction interposing. In the case of asynchronous 
> messaging we are looking at reducing message exchange, so we 
> would always do interposing. I can't think of a case where 
> you would want to do asynchronous messaging without interposing.

All this makes sense, and are patterns for implementation of services
that are transaction and need grates of sort  -either messaging,
transactional commitment or process. BTP is simply a way to do
transactional commitment in a negotiated manner, and does not infer any
particular pattern for implementation, but simply defines the protocol
for negotiating the commitment. I understand the need for RM on the
provider side for complex interactions and between the requestor and
service provider for asynch communication, and think that BTP adds value
to this in managing the commitment of the work requested and being
undertaken. Again the combination of both is very powerful.

> 
> arkin
> 
> >
> > >
> > > arkin
> > >
> > >
> > >
> > > >
> > > > This is not making a case for not needing RM - simply there are 
> > > > many ways to "skin the cat" and the layers you require will be 
> > > > determined by the value of the interactions and their 
> criticality 
> > > > to the business.
> > > >
> > > > Mark
> > > >
> > > >
> > > > 	-----Original Message-----
> > > > 	From: Assaf Arkin [mailto:arkin@intalio.com]
> > > > 	Sent: Sat 12/21/2002 9:40 PM
> > > > 	To: Peter Furniss; Patil, Sanjaykumar
> > > > 	Cc: Www-Ws-Arch
> > > > 	Subject: RE: Reliability is really two-phase 
> (was RE: Reliable 
> > > > Web Services)
> > > >
> > > >
> > > >
> > > >
> > > > 	Let's assume a typical and quite common scenario.
> > > >
> > > > 	A buyer sends a message to a supplier asking to 
> buy a product. 
> > > > The buyer
> > > > 	expects that it may take 8 hours before the 
> supplier can indicate 
> > > > whether
> > > > 	the purchase can be processed. The seller needs 
> to check 
> > > > inventory levels to
> > > > 	determine when the product can be shipped, 
> update its production 
> > > > plan,
> > > > 	validate the buyer's credit, etc.
> > > >
> > > > 	This is usually a fast process when the product 
> is in inventory, 
> > > > or the
> > > > 	inventory is constantly replenished, and the 
> process is entirely 
> > > > automatic.
> > > > 	It takes longer when the product has to be produced,
> > > demand exceeds
> > > > 	inventory, or the process is not entirely 
> automatic (which as we 
> > > > all know is
> > > > 	quite common in the business world).
> > > >
> > > > 	The request got lost in transit, and after 
> eight hours the buyer 
> > > > does not
> > > > 	hear back from the supplier. Since the buyer 
> and supplier use a 
> > > > coordination
> > > > 	protocol, they can both agree whether the 
> product would be 
> > > > delivered. They
> > > > 	need coordination to deal with a variety of 
> cases, such as the 
> > > > buyer not
> > > > 	aggering to the delivery date provided by the 
> supplier, or 
> > > > agreeing to the
> > > > 	delivery date for some items but deciding to 
> remove other items 
> > > > from the
> > > > 	purchase order.
> > > >
> > > > 	The buyer decides to still procure the product 
> form the seller, 
> > > > and issues
> > > > 	another purchase order. Since the buyer and 
> supplier coordinate 
> > > > we get
> > > > 	exactly one order (the buyer has determined 
> that the previous 
> > > > order did not
> > > > 	get through), no loss of consistency.
> > > >
> > > > 	All we lost are eight hours.
> > > >
> > > > 	Now, let's equip the buyer and supplier with an 
> RM solution. The 
> > > > RM solution
> > > > 	does not attempt to determine whether the order will be 
> > > > processed, when
> > > > 	delivery will occur, etc. All it cares about is 
> getting the 
> > > > request to the
> > > > 	supplier.
> > > >
> > > > 	The RM expects to hear an ack after 30 minutes. 
> Since no ack has 
> > > > been
> > > > 	received, the RM tries to resend the message 
> and the second 
> > > > message makes it
> > > > 	to the supplier and the ack is recieved by the 
> buyer's RM.
> > > >
> > > > 	Due to the unreliability of the transport the 
> supplier has lost 
> > > > 30 minutes
> > > > 	for processing the request, but is still able 
> to respond to the 
> > > > buyer before
> > > > 	eight hours have passed. The buyer does not 
> have to attempt and 
> > > > resolve the
> > > > 	situation after eight hours has passed, saving 
> resources for both 
> > > > buyer and
> > > > 	supplier and expediting the delivery of the 
> product and any other 
> > > > process
> > > > 	that depends on the delivery date being known. 
> (For example, 
> > > > because the
> > > > 	buyer is also a supplier and has to report back 
> to its buyers)
> > > >
> > > > 	Would you say that RM has some added value?
> > > >
> > > > 	arkin
> > > >
> > > >
> > > > 	> -----Original Message-----
> > > > 	> From: www-ws-arch-request@w3.org 
> > > > [mailto:www-ws-arch-request@w3.org]On
> > > > 	> Behalf Of Peter Furniss
> > > > 	> Sent: Saturday, December 21, 2002 5:39 PM
> > > > 	> To: Patil, Sanjaykumar
> > > > 	> Cc: Www-Ws-Arch
> > > > 	> Subject: RE: Reliability is really two-phase (was RE:
> > > Reliable Web
> > > > 	> Services)
> > > > 	>
> > > > 	>
> > > > 	>
> > > > 	> Sanjay replied directly to me, but his 
> comments are worth
> > > > 	> stirring into the
> > > > 	> public
> > > > 	> pot (and he's ok with that). My comments interspersed:
> > > > 	>
> > > > 	> > -----Original Message-----
> > > > 	> > From: Patil, Sanjaykumar 
> [mailto:sanjay.patil@iona.com]
> > > > 	> > Sent: 21 December 2002 02:32
> > > > 	> > To: Peter Furniss
> > > > 	> > Subject: RE: Reliability is really 
> two-phase (was RE: 
> > > > Reliable Web
> > > > 	> > Services)
> > > > 	> >
> > > > 	> >
> > > > 	> >
> > > > 	> > Peter, would it be correct to say that - If
> > > somebody wanted to
> > > > 	> > deploy BTP entirely for achieving RM today, 
> it should be
> > > > 	> > possible. Perhaps, this may not be the best 
> use of BTP, since 
> > > > the
> > > > 	> > state alignment problems solved by RM is more of 
> > > > infrastructrural
> > > > 	> > in nature, where as BTP, AFAIK, is 
> primarily intended for
> > > > 	> > business state alignment. Therefore could I 
> say that -
> > > > 	> > a> The use of BTP for business state alignment
> > > makes low level
> > > > 	> > state alignment and therefore RM unnecessary
> > > > 	> > b> BTP technology is neutral to the nature 
> of state alignment 
> > > > and
> > > > 	> > therefore could be deployed for achieving purely
> > > the goals of RM
> > > > 	> > c> The BTP machinery is similar (superset!) 
> to a typical RM
> > > > 	> > solution and therefore does not introduce 
> huge overheads for
> > > > 	> > maintaining its flexibility 
> (extensibility!) in supporting
> > > > 	> > additional coordination functionalities.
> > > > 	>
> > > > 	> yes, that is exactly what I meant. BTP does 
> not directly know
> > > > 	> what "prepared" means, so it could just mean "it is
> > > safely here".
> > > > 	>
> > > > 	> > I guess, many of us think that solving RM  
> is practically a
> > > > 	> > "must", where as solving business level 
> coordination in an
> > > > 	> > efficient manner is still perceived as "future" (in
> > > spite of the
> > > > 	> > smart work you guys did in BTP :-). Therefore, the
> > > argument of
> > > > 	> > "BTP making RM unnecessary" to me is like 
> selling cake when 
> > > > bread
> > > > 	> > is in high demand. However, if my 
> understanding as above is
> > > > 	> > correct (i.e. BTP can solve RM today and if 
> needed other
> > > > 	> > coordination problems tomorrow), perhaps RM 
> is the best 
> > > > launching
> > > > 	> > pad for BTP.
> > > > 	>
> > > > 	> but if cake is as cheap as bread ...   :-)
> > > > 	>
> > > > 	> (cheapness might not be price exactly - 
> manageability, 
> > > > availability
> > > > 	> might be more significant)
> > > > 	>
> > > > 	> > Just a thought. May be I got the whole 
> thing completely 
> > > > wrong, in
> > > > 	> > which case please pardon me for taking your 
> precious time.
> > > > 	>
> > > > 	>
> > > > 	> >
> > > > 	> > Have a good weekend.
> > > > 	> >
> > > > 	> > thanks,
> > > > 	> > sanjay
> > > > 	> >
> > > > 	> >
> > > > 	> > -----Original Message-----
> > > > 	> > From: Peter Furniss 
> [mailto:peter.furniss@choreology.com]
> > > > 	> > Sent: Friday, December 20, 2002 3:56 AM
> > > > 	> > To: Ricky Ho; www-ws-arch@w3.org
> > > > 	> > Subject: RE: Reliability is really 
> two-phase (was RE: 
> > > > Reliable Web
> > > > 	> > Services)
> > > > 	> >
> > > > 	> >
> > > > 	> >
> > > > 	> > Ricky Ho replied to me:
> > > > 	> >
> > > > 	> > > Are you implying at point (j) that by 
> using BTP, reliable
> > > > 	> > > messaging is not
> > > > 	> > > necessary ?  I think they are solving orthogonal
> > > problem.  In
> > > > 	> fact, BTP
> > > > 	> > > without reliable messaging is not 
> sufficient for conducting 
> > > > high money
> > > > 	> > > value transaction in a reliable manner.
> > > > 	> >
> > > > 	> > Yes, I don't think RM is necessary with 
> BTP. The BTP exchange 
> > > > means that
> > > > 	> > the application work (e.g. money transfer) won't
> > > happen unless
> > > > 	> both sides
> > > > 	> > agree that they understand and want to do 
> it. If the pattern 
> > > > follows the
> > > > 	> > typical sequence:
> > > > 	> >
> > > > 	> >     client requests transfer
> > > > 	> >     server says it can do it, iff the 
> client confirms
> > > > 	> >     client confirms
> > > > 	> >     server applies confirmation, and tells the
> > > client it is done
> > > > 	> >
> > > > 	> > then you have a stronger mechanism than RM, 
> which is 
> > > > concerned only
> > > > 	> > with being a reliable postman.  
> (admittedly, if you map 
> > > > things in a
> > > > 	> > particular way, the two end up becoming 
> fairly close - 
> > > > certainly if the
> > > > 	> > detailed application behaviour is fixed 
> assuming an RM 
> > > > pattern, BTP
> > > > 	> > can carry the identical semantics - though 
> it has some extra
> > > > 	> > flexibilities that RM would have difficulty with).
> > > > 	> >
> > > > 	> > Peter
> > > > 	> >
> > > > 	> >
> > > > 	> > >
> > > > 	> > > Rgds, Ricky
> > > > 	> > >
> > > > 	> > >
> > > > 	> > > At 02:16 AM 12/18/2002 +0000, Peter Furniss wrote:
> > > > 	> > >
> > > > 	> > >
> > > > 	> > > >The reliability requirement really means 
> that you need
> > > > 	> > > >the sort of mechanisms and exchanges of 
> two-phase outcome
> > > > 	> > > >(as in OASIS BTP).  "reliable messaging",
> > > depending on the
> > > > 	> > > >details of its mechanisms, is variously giving
> > > less that it
> > > > 	> > > >seems, or is just as complicated (and, in some
> > > cases, both).
> > > > 	> > > >
> > > > 	> > > >
> > > > 	> > > >To expand that assertion a bit:
> > > > 	> > > >
> > > > 	> > > >a) i'm assuming reliability can be 
> defined as two parties
> > > > 	> > needing to have
> > > > 	> > > >a consistent view as to whether some 
> work has or has not 
> > > > been done
> > > > 	> > > >by one of them at the request of the other
> > > > 	> > > >   [ this is the 0 or 1 case, and is the 
> centre of
> > > > state alignment -
> > > > 	> > > >   where I change my view of the shared state
> > > > because I know you
> > > > 	> > > have/will]
> > > > 	> > > >
> > > > 	> > > >
> > > > 	> > > >b) the critical feature is that one side accepts
> > > > 	> > > >that the other side will make the definitive
> > > determination as
> > > > 	> > > >to whether the work is to be done; the 
> deferring side
> > > > 	> > > >agrees to accept/apply/follow that 
> determination once it 
> > > > knows of it
> > > > 	> > > >
> > > > 	> > > >  [ which is the essence of the solution to the
> > > two armies
> > > > 	> > > problem - their
> > > > 	> > > >problem was that neither side will make 
> an unconditional
> > > > 	> decision, but
> > > > 	> > > >wants the other side to make an 
> irrevocable decision as a
> > > > 	> condition of
> > > > 	> > > >its own]
> > > > 	> > > >
> > > > 	> > > >c) once the determination has been made, 
> the repetition 
> > > > and recovery
> > > > 	> > > >rules of the transaction protocol make 
> sure the other side 
> > > > will
> > > > 	> > > >know eventually
> > > > 	> > > >
> > > > 	> > > >d) you normally want to know that the 
> application has 
> > > > really done
> > > > 	> > > >the work. In some cases, it may be sufficient to
> > > know that
> > > > 	> > > >the work will eventually be done (e.g. it's been
> > > dropped on a
> > > > 	> > > >reliable queue) - but that means that 
> either there is no
> > > > 	> > > >comeback or any comeback is a whole new activity.
> > > > 	> > > >
> > > > 	> > > >e) the "simple" ack approach actually 
> requires some extra
> > > > 	> > > >messages to avoid one or both sides having to
> > > remember the
> > > > 	> > > >request (or some identification on it)
> > > indefinitely or have
> > > > 	> > > >a complicated set of timeout rules as to when
> > > they can forget
> > > > 	> > > >things. (and that's before we worry about
> > > surviving crashes)
> > > > 	> > > >
> > > > 	> > > >f) reliable messaging (including things 
> like HTTPR) are
> > > > 	> > > >distinguished from two-phase outcome only by
> > > what is counted
> > > > 	> > > >as the "decision" - it's "message 
> received", not "work 
> > > > is/will
> > > > 	> > be done".
> > > > 	> > > >The systems have to store similar 
> information/identifiers
> > > > 	> > > >and follow similar rules as to when to 
> persist and
> > > > 	> > > >delete this information. [ in other 
> words, it's not really 
> > > > simpler
> > > > 	> > > >to just use reliable messaging ]
> > > > 	> > > >
> > > > 	> > > >g) some of the scenarios differ from the classic
> > > > 	> > > >two-phase commit exchanges in that the sender of
> > > the first
> > > > 	> > > >message is the one that defers to the other
> > > side's decision.
> > > > 	> > > >(classic two-phase is client asks server 
> to defer to the
> > > > 	> > > >client's decision). This has some impact 
> on how the
> > > > 	> > > >relationship gets established, but 
> doesn't significantly
> > > > 	> > > >affect what happens later (in terms of retries,
> > > persistence,
> > > > 	> > > >recovery sequences).
> > > > 	> > > >
> > > > 	> > > >h) expel from your mind any assumptions about
> > > how the party
> > > > 	> > > >that is waiting on the other's 
> determination/decision is
> > > > 	> > > >holding itself able to obey. (two-phase 
> commit does *not*
> > > > 	> > > >imply two-phase locking). It may hold 
> the information in
> > > > 	> > > >a distinguished interim state (outbound 
> buffer, uncleared 
> > > > funds,
> > > > 	> > > >marked as reserved). It may completely perform
> > > its work and
> > > > 	> > > >retain a means of un-performing it. It 
> may just check it 
> > > > could
> > > > 	> > > >perform its work and remember what it must do.
> > > > 	> > > >
> > > > 	> > > >i) the transaction mechanisms actually allow for
> > > more complex
> > > > 	> > > >arrangements - the coordination role can 
> be distinguished 
> > > > from
> > > > 	> > > >the resource-holding parties on each side, and
> > > there can be
> > > > 	> > > >more than two such parties. But for comparison
> > > with reliable
> > > > 	> > > >messaging, we can consider all the roles to be
> > > on one side or
> > > > 	> > > >the other, and consider only a single bilateral
> > > relationship.
> > > > 	> > > >
> > > > 	> > > >j) using a loosely-coupled transaction 
> mechanism like BTP 
> > > > means
> > > > 	> > > >the application code doesn't have to get 
> tangled up in the 
> > > > recovery,
> > > > 	> > > >repeats etc. Setting of timeouts and the 
> like becomes a
> > > > 	> > > >configuration question (possibly even a dynamic
> > > configuration
> > > > 	> > > >question if you really want to).
> > > > 	> > > >
> > > > 	> > > >k) a two-phase outcome exchange doesn't 
> really seem to 
> > > > count as
> > > > 	> > > >"orchestration" or "choreography" as I
> > > understand those. It's
> > > > 	> > > >just a matter "please do this", "I can do this",
> > > "I can't do
> > > > 	> this" etc.
> > > > 	> > > >Any 
> compensation/counter-operation/reversal is delegated 
> > > > to the
> > > > 	> > > >party that has to do the reversal, rather than
> > > having to be
> > > > 	> > > >explicitly exposed as a counter-operation
> > > distinctly accessed
> > > > 	> > > >by the other side.
> > > > 	> > > >
> > > > 	> > > >
> > > > 	> > > >That's enough for now - I'm probably still
> > > obscure through
> > > > 	> > > >brevity, but the message is long enough already.
> > > > 	> > > >
> > > > 	> > > >Peter
> > > > 	> > > >
> > > > 	> > > >------------------------------------------
> > > > 	> > > >Peter Furniss
> > > > 	> > > >Chief Scientist, Choreology Ltd
> > > > 	> > > >
> > > > 	> > > >    Cohesions 1.0 (TM)
> > > > 	> > > >    Business transaction management software for
> > > application
> > > > 	> > coordination
> > > > 	> > > >
> > > > 	> > > >web: http://www.choreology.com
> > > > 	> > > >email:  peter.furniss@choreology.com
> > > > 	> > > >phone:  +44 20 7670 1679
> > > > 	> > > >direct: +44 20 7670 1783
> > > > 	> > > >mobile: +44 7951 536168
> > > > 	> > > >13 Austin Friars, London EC2N 2JX
> > > > 	> > >
> > > > 	> >
> > > > 	>
> > > >
> > > >
> > > >
> > >
> > >
> >
> 
>
Received on Monday, 23 December 2002 17:49:51 UTC