RE: Reliability is really two-phase (was RE: Reliable Web Services) from Mark Potts on 2002-12-23 (www-ws-arch@w3.org from December 2002)

From: Mark Potts <mark.potts@talkingblocks.com>
Date: Mon, 23 Dec 2002 09:45:44 -0800
To: "Assaf Arkin" <arkin@intalio.com>, "Peter Furniss" <peter.furniss@choreology.com>, "Patil, Sanjaykumar" <sanjay.patil@iona.com>
Cc: "Www-Ws-Arch" <www-ws-arch@w3.org>
Message-ID: <6A852758A8437041A389D1AE74D8760A0FE0D1@mach5.talkingblocks.com>
> > Arkin
> >  
> > It seems to me that we are trying to role many things onto one
> > here  - There are three layers IMO 1) the state of individual 
> > message (sent,  recieved, acknowledged) where RM will help in 
> > many cases, 2) the commitment or synchronization of state 
> > changes that result from  a single or set of messages (enlisted, 
> > prepared, committed) where BTP can help, and the state of the 
> > conversation  or process (commitment to defined constraints 
> > within a process) instantiated by message exchanges.
> >  
> > The last 2 together can answer many of problems that RM has been
> > applied to, but that does not negate the need for RM in every 
> > instance. There will be a need for one or combinations of the 
> > three levels to answer many business problems/interactions. 
> 
> I agree. These are different but complementary solution and 
> the combination in which they are used depends on the problem 
> you are trying to use. If you use synchronous messaging you 
> will not be concerned with an RM. If you do not need a 
> coordination an RM would be sufficient. If you do 
> asynchronous messaging and need a high level of coordination 
> you would have to look at all three solutions.
> 
> > For example what happens in the example scenario if the ack is 
> > expected back from the supplier in 30 minutes and does not arrive 
> > only for the buyer to resend and then get back the ack from 
> > the original request. Unless the supplier can recognize the same 
> > message precisely it may create a second replicate order - in 
> > the case of BTP this is handled ( consistent context for both 
> > messages) and BTP could be used to handle the 30 minute rule, 
> > where participants have to be enrolled in the transaction (in 
> > this case ack the request and agreed to participate in the 
> > transaction) within that time frame.
> 
> When you resend a message you resend an exact copy of the 
> same message, and the RM only delivers one instance of the 
> message to the application. So the whole process of acking 
> and ignoring duplicates is totally transparent to the application.
> 
> If the application was trying to deal with this situation, it 
> would complicate the application. It would generate two 
> different messages and at the receiving end would have to 
> identify that they relate to the same purchase order, so you 
> would require application code to deal with duplicates.

I understand this, and agree, and the same applies to BTP and PM being
outside the application or functionality  layer - the layers in my mind
get closer to the application layer as you get higher in the stack
(transport->protocol->context->process->application).
> 
> Transactions would definitely help. However, since the sender 
> is not able to communicate with the receiver, the sender 
> would cancel the transaction and send a new message in a 
> different transaction. When the receiver comes back online it 
> has to cancel the first transaction on its side before it can 
> process the new message in a new transaction. An RM solution 
> would be slightly more efficient.

BTP has this type of resilience but in the context of, what I call a
commitment, what BTP describes as a business transaction.
> 
> This raises a different question. Applications that use 
> synchronous protocols or use transactions to synchronize 
> every message delivery do not suffer from message loss. These 
> applications do not require an RM.

> 
> For some applications it is beneficial if you can reduce the 
> amount of synchronous message exchange. You would want to use 
> asynchronous protocols, and want to reduce traffic to the 
> minimum. In this case the RM helps by reducing the traffic to 
> two messages in the general case. (The idea is that you want 
> to optimize for the general case when everything works 
> smoothly, even if you have to pay for it in performance when 
> things go bad)

Agreed

> 
> I would think than an efficient protocol for asynchronous 
> messaging would piggyback transaction states on top of 
> application messages. 

This is the BTP optimized, in my mind general case.

> In this case the receiver is enlisted 
> in the transaction by the sender when the first message in 
> the transaction is sent to the receiver. This does not commit 
> the receiver to anything, but allows them to determine the 
> transaction outcome or change it without having to send a 
> separate enlistment message.
> 
> What's your opinion?

This is another possible optimization  - but may not be flexible enough
in some cases considered for BTP (the sender does not give the receiver
the opportunity to decide who/what needs to enlist ). 

Today the simplest case involves the receiver agreeing to participate in
the transaction (the transaction context, as part of the application
message is sort of an invite to provider to enlist parties under its
control in the transaction context provided). The receiver can do so in
conjunction with the second message passed from receiver back to sender,
the message can also optionally have state about the work requested -
i.e. prepared statement if the work is completed (with whatever caveats
it wants to impose). When done this way you still have two messages,
outside the termination phase of the message exchange.

> 
> arkin
> 
> 
> 
> >  
> > This is not making a case for not needing RM - simply there are 
> > many ways to "skin the cat" and the layers you require will be 
> > determined by the value of the interactions and their criticality 
> > to the business.
> >  
> > Mark
> >  
> > 
> > 	-----Original Message----- 
> > 	From: Assaf Arkin [mailto:arkin@intalio.com] 
> > 	Sent: Sat 12/21/2002 9:40 PM 
> > 	To: Peter Furniss; Patil, Sanjaykumar 
> > 	Cc: Www-Ws-Arch 
> > 	Subject: RE: Reliability is really two-phase (was RE: 
> > Reliable Web Services)
> > 	
> > 	
> > 
> > 
> > 	Let's assume a typical and quite common scenario.
> > 	
> > 	A buyer sends a message to a supplier asking to buy a 
> > product. The buyer
> > 	expects that it may take 8 hours before the supplier can 
> > indicate whether
> > 	the purchase can be processed. The seller needs to check 
> > inventory levels to
> > 	determine when the product can be shipped, update its 
> > production plan,
> > 	validate the buyer's credit, etc.
> > 	
> > 	This is usually a fast process when the product is in 
> > inventory, or the
> > 	inventory is constantly replenished, and the process is 
> > entirely automatic.
> > 	It takes longer when the product has to be produced, 
> demand exceeds
> > 	inventory, or the process is not entirely automatic (which 
> > as we all know is
> > 	quite common in the business world).
> > 	
> > 	The request got lost in transit, and after eight hours the 
> > buyer does not
> > 	hear back from the supplier. Since the buyer and supplier 
> > use a coordination
> > 	protocol, they can both agree whether the product would be 
> > delivered. They
> > 	need coordination to deal with a variety of cases, such as 
> > the buyer not
> > 	aggering to the delivery date provided by the supplier, or 
> > agreeing to the
> > 	delivery date for some items but deciding to remove other 
> > items from the
> > 	purchase order.
> > 	
> > 	The buyer decides to still procure the product form the 
> > seller, and issues
> > 	another purchase order. Since the buyer and supplier 
> > coordinate we get
> > 	exactly one order (the buyer has determined that the 
> > previous order did not
> > 	get through), no loss of consistency.
> > 	
> > 	All we lost are eight hours.
> > 	
> > 	Now, let's equip the buyer and supplier with an RM 
> > solution. The RM solution
> > 	does not attempt to determine whether the order will be 
> > processed, when
> > 	delivery will occur, etc. All it cares about is getting the 
> > request to the
> > 	supplier.
> > 	
> > 	The RM expects to hear an ack after 30 minutes. Since no 
> > ack has been
> > 	received, the RM tries to resend the message and the second 
> > message makes it
> > 	to the supplier and the ack is recieved by the buyer's RM.
> > 	
> > 	Due to the unreliability of the transport the supplier has 
> > lost 30 minutes
> > 	for processing the request, but is still able to respond to 
> > the buyer before
> > 	eight hours have passed. The buyer does not have to attempt 
> > and resolve the
> > 	situation after eight hours has passed, saving resources 
> > for both buyer and
> > 	supplier and expediting the delivery of the product and any 
> > other process
> > 	that depends on the delivery date being known. (For 
> > example, because the
> > 	buyer is also a supplier and has to report back to its buyers)
> > 	
> > 	Would you say that RM has some added value?
> > 	
> > 	arkin
> > 	
> > 	
> > 	> -----Original Message-----
> > 	> From: www-ws-arch-request@w3.org 
> > [mailto:www-ws-arch-request@w3.org]On
> > 	> Behalf Of Peter Furniss
> > 	> Sent: Saturday, December 21, 2002 5:39 PM
> > 	> To: Patil, Sanjaykumar
> > 	> Cc: Www-Ws-Arch
> > 	> Subject: RE: Reliability is really two-phase (was RE: 
> Reliable Web
> > 	> Services)
> > 	>
> > 	>
> > 	>
> > 	> Sanjay replied directly to me, but his comments are worth
> > 	> stirring into the
> > 	> public
> > 	> pot (and he's ok with that). My comments interspersed:
> > 	>
> > 	> > -----Original Message-----
> > 	> > From: Patil, Sanjaykumar [mailto:sanjay.patil@iona.com]
> > 	> > Sent: 21 December 2002 02:32
> > 	> > To: Peter Furniss
> > 	> > Subject: RE: Reliability is really two-phase (was RE: 
> > Reliable Web
> > 	> > Services)
> > 	> >
> > 	> >
> > 	> >
> > 	> > Peter, would it be correct to say that - If 
> somebody wanted to
> > 	> > deploy BTP entirely for achieving RM today, it should be
> > 	> > possible. Perhaps, this may not be the best use of BTP, 
> > since the
> > 	> > state alignment problems solved by RM is more of 
> > infrastructrural
> > 	> > in nature, where as BTP, AFAIK, is primarily intended for
> > 	> > business state alignment. Therefore could I say that -
> > 	> > a> The use of BTP for business state alignment 
> makes low level
> > 	> > state alignment and therefore RM unnecessary
> > 	> > b> BTP technology is neutral to the nature of state 
> > alignment and
> > 	> > therefore could be deployed for achieving purely 
> the goals of RM
> > 	> > c> The BTP machinery is similar (superset!) to a typical RM
> > 	> > solution and therefore does not introduce huge overheads for
> > 	> > maintaining its flexibility (extensibility!) in supporting
> > 	> > additional coordination functionalities.
> > 	>
> > 	> yes, that is exactly what I meant. BTP does not directly know
> > 	> what "prepared" means, so it could just mean "it is 
> safely here".
> > 	>
> > 	> > I guess, many of us think that solving RM  is practically a
> > 	> > "must", where as solving business level coordination in an
> > 	> > efficient manner is still perceived as "future" (in 
> spite of the
> > 	> > smart work you guys did in BTP :-). Therefore, the 
> argument of
> > 	> > "BTP making RM unnecessary" to me is like selling cake 
> > when bread
> > 	> > is in high demand. However, if my understanding as above is
> > 	> > correct (i.e. BTP can solve RM today and if needed other
> > 	> > coordination problems tomorrow), perhaps RM is the best 
> > launching
> > 	> > pad for BTP.
> > 	>
> > 	> but if cake is as cheap as bread ...   :-)
> > 	>
> > 	> (cheapness might not be price exactly - manageability, 
> > availability
> > 	> might be more significant)
> > 	>
> > 	> > Just a thought. May be I got the whole thing completely 
> > wrong, in
> > 	> > which case please pardon me for taking your precious time.
> > 	>
> > 	>
> > 	> >
> > 	> > Have a good weekend.
> > 	> >
> > 	> > thanks,
> > 	> > sanjay
> > 	> >
> > 	> >
> > 	> > -----Original Message-----
> > 	> > From: Peter Furniss [mailto:peter.furniss@choreology.com]
> > 	> > Sent: Friday, December 20, 2002 3:56 AM
> > 	> > To: Ricky Ho; www-ws-arch@w3.org
> > 	> > Subject: RE: Reliability is really two-phase (was RE: 
> > Reliable Web
> > 	> > Services)
> > 	> >
> > 	> >
> > 	> >
> > 	> > Ricky Ho replied to me:
> > 	> >
> > 	> > > Are you implying at point (j) that by using BTP, reliable
> > 	> > > messaging is not
> > 	> > > necessary ?  I think they are solving orthogonal 
> problem.  In
> > 	> fact, BTP
> > 	> > > without reliable messaging is not sufficient for 
> > conducting high money
> > 	> > > value transaction in a reliable manner.
> > 	> >
> > 	> > Yes, I don't think RM is necessary with BTP. The BTP 
> > exchange means that
> > 	> > the application work (e.g. money transfer) won't 
> happen unless
> > 	> both sides
> > 	> > agree that they understand and want to do it. If the 
> > pattern follows the
> > 	> > typical sequence:
> > 	> >
> > 	> >     client requests transfer
> > 	> >     server says it can do it, iff the client confirms
> > 	> >     client confirms
> > 	> >     server applies confirmation, and tells the 
> client it is done
> > 	> >
> > 	> > then you have a stronger mechanism than RM, which is 
> > concerned only
> > 	> > with being a reliable postman.  (admittedly, if you map 
> > things in a
> > 	> > particular way, the two end up becoming fairly close - 
> > certainly if the
> > 	> > detailed application behaviour is fixed assuming an RM 
> > pattern, BTP
> > 	> > can carry the identical semantics - though it has some extra
> > 	> > flexibilities that RM would have difficulty with).
> > 	> >
> > 	> > Peter
> > 	> >
> > 	> >
> > 	> > >
> > 	> > > Rgds, Ricky
> > 	> > >
> > 	> > >
> > 	> > > At 02:16 AM 12/18/2002 +0000, Peter Furniss wrote:
> > 	> > >
> > 	> > >
> > 	> > > >The reliability requirement really means that you need
> > 	> > > >the sort of mechanisms and exchanges of two-phase outcome
> > 	> > > >(as in OASIS BTP).  "reliable messaging", 
> depending on the
> > 	> > > >details of its mechanisms, is variously giving 
> less that it
> > 	> > > >seems, or is just as complicated (and, in some 
> cases, both).
> > 	> > > >
> > 	> > > >
> > 	> > > >To expand that assertion a bit:
> > 	> > > >
> > 	> > > >a) i'm assuming reliability can be defined as two parties
> > 	> > needing to have
> > 	> > > >a consistent view as to whether some work has or has 
> > not been done
> > 	> > > >by one of them at the request of the other
> > 	> > > >   [ this is the 0 or 1 case, and is the centre of 
> > state alignment -
> > 	> > > >   where I change my view of the shared state 
> > because I know you
> > 	> > > have/will]
> > 	> > > >
> > 	> > > >
> > 	> > > >b) the critical feature is that one side accepts
> > 	> > > >that the other side will make the definitive 
> determination as
> > 	> > > >to whether the work is to be done; the deferring side
> > 	> > > >agrees to accept/apply/follow that determination 
> > once it knows of it
> > 	> > > >
> > 	> > > >  [ which is the essence of the solution to the 
> two armies
> > 	> > > problem - their
> > 	> > > >problem was that neither side will make an unconditional
> > 	> decision, but
> > 	> > > >wants the other side to make an irrevocable decision as a
> > 	> condition of
> > 	> > > >its own]
> > 	> > > >
> > 	> > > >c) once the determination has been made, the 
> > repetition and recovery
> > 	> > > >rules of the transaction protocol make sure the 
> > other side will
> > 	> > > >know eventually
> > 	> > > >
> > 	> > > >d) you normally want to know that the application 
> > has really done
> > 	> > > >the work. In some cases, it may be sufficient to 
> know that
> > 	> > > >the work will eventually be done (e.g. it's been 
> dropped on a
> > 	> > > >reliable queue) - but that means that either there is no
> > 	> > > >comeback or any comeback is a whole new activity.
> > 	> > > >
> > 	> > > >e) the "simple" ack approach actually requires some extra
> > 	> > > >messages to avoid one or both sides having to 
> remember the
> > 	> > > >request (or some identification on it) 
> indefinitely or have
> > 	> > > >a complicated set of timeout rules as to when 
> they can forget
> > 	> > > >things. (and that's before we worry about 
> surviving crashes)
> > 	> > > >
> > 	> > > >f) reliable messaging (including things like HTTPR) are
> > 	> > > >distinguished from two-phase outcome only by 
> what is counted
> > 	> > > >as the "decision" - it's "message received", not 
> > "work is/will
> > 	> > be done".
> > 	> > > >The systems have to store similar information/identifiers
> > 	> > > >and follow similar rules as to when to persist and
> > 	> > > >delete this information. [ in other words, it's not 
> > really simpler
> > 	> > > >to just use reliable messaging ]
> > 	> > > >
> > 	> > > >g) some of the scenarios differ from the classic
> > 	> > > >two-phase commit exchanges in that the sender of 
> the first
> > 	> > > >message is the one that defers to the other 
> side's decision.
> > 	> > > >(classic two-phase is client asks server to defer to the
> > 	> > > >client's decision). This has some impact on how the
> > 	> > > >relationship gets established, but doesn't significantly
> > 	> > > >affect what happens later (in terms of retries, 
> persistence,
> > 	> > > >recovery sequences).
> > 	> > > >
> > 	> > > >h) expel from your mind any assumptions about 
> how the party
> > 	> > > >that is waiting on the other's determination/decision is
> > 	> > > >holding itself able to obey. (two-phase commit does *not*
> > 	> > > >imply two-phase locking). It may hold the information in
> > 	> > > >a distinguished interim state (outbound buffer, 
> > uncleared funds,
> > 	> > > >marked as reserved). It may completely perform 
> its work and
> > 	> > > >retain a means of un-performing it. It may just 
> > check it could
> > 	> > > >perform its work and remember what it must do.
> > 	> > > >
> > 	> > > >i) the transaction mechanisms actually allow for 
> more complex
> > 	> > > >arrangements - the coordination role can be 
> > distinguished from
> > 	> > > >the resource-holding parties on each side, and 
> there can be
> > 	> > > >more than two such parties. But for comparison 
> with reliable
> > 	> > > >messaging, we can consider all the roles to be 
> on one side or
> > 	> > > >the other, and consider only a single bilateral 
> relationship.
> > 	> > > >
> > 	> > > >j) using a loosely-coupled transaction mechanism 
> > like BTP means
> > 	> > > >the application code doesn't have to get tangled up 
> > in the recovery,
> > 	> > > >repeats etc. Setting of timeouts and the like becomes a
> > 	> > > >configuration question (possibly even a dynamic 
> configuration
> > 	> > > >question if you really want to).
> > 	> > > >
> > 	> > > >k) a two-phase outcome exchange doesn't really seem 
> > to count as
> > 	> > > >"orchestration" or "choreography" as I 
> understand those. It's
> > 	> > > >just a matter "please do this", "I can do this", 
> "I can't do
> > 	> this" etc.
> > 	> > > >Any compensation/counter-operation/reversal is 
> > delegated to the
> > 	> > > >party that has to do the reversal, rather than 
> having to be
> > 	> > > >explicitly exposed as a counter-operation 
> distinctly accessed
> > 	> > > >by the other side.
> > 	> > > >
> > 	> > > >
> > 	> > > >That's enough for now - I'm probably still 
> obscure through
> > 	> > > >brevity, but the message is long enough already.
> > 	> > > >
> > 	> > > >Peter
> > 	> > > >
> > 	> > > >------------------------------------------
> > 	> > > >Peter Furniss
> > 	> > > >Chief Scientist, Choreology Ltd
> > 	> > > >
> > 	> > > >    Cohesions 1.0 (TM)
> > 	> > > >    Business transaction management software for 
> application
> > 	> > coordination
> > 	> > > >
> > 	> > > >web: http://www.choreology.com
> > 	> > > >email:  peter.furniss@choreology.com
> > 	> > > >phone:  +44 20 7670 1679
> > 	> > > >direct: +44 20 7670 1783
> > 	> > > >mobile: +44 7951 536168
> > 	> > > >13 Austin Friars, London EC2N 2JX
> > 	> > >
> > 	> >
> > 	>
> > 	
> > 	
> > 
> 
>
Received on Monday, 23 December 2002 12:46:54 UTC