RE: Reliability is really two-phase (was RE: Reliable Web Services) from Assaf Arkin on 2002-12-23 (www-ws-arch@w3.org from December 2002)

From: Assaf Arkin <arkin@intalio.com>
Date: Mon, 23 Dec 2002 10:38:59 -0800
To: "Mark Potts" <mark.potts@talkingblocks.com>, "Peter Furniss" <peter.furniss@choreology.com>, "Patil, Sanjaykumar" <sanjay.patil@iona.com>
Cc: "Www-Ws-Arch" <www-ws-arch@w3.org>
Message-ID: <IGEJLEPAJBPHKACOOKHNOEHDCPAA.arkin@intalio.com>
> > When you resend a message you resend an exact copy of the
> > same message, and the RM only delivers one instance of the
> > message to the application. So the whole process of acking
> > and ignoring duplicates is totally transparent to the application.
> >
> > If the application was trying to deal with this situation, it
> > would complicate the application. It would generate two
> > different messages and at the receiving end would have to
> > identify that they relate to the same purchase order, so you
> > would require application code to deal with duplicates.
>
> I understand this, and agree, and the same applies to BTP and PM being
> outside the application or functionality  layer - the layers in my mind
> get closer to the application layer as you get higher in the stack
> (transport->protocol->context->process->application).

There's an important principle here: you reduce the cost of development and
improve reliability if you simplify the application by putting the
complexity in separate layers. So for an RM (RM=MOM) you would automatically
do the resend without bothering the application. And for BTP you would have
a generic layer that does the coordination. That way, through simple
extensions to my applications that require little logic, I can get to
benefit from coordinated transactions and improve reliability.


> > In this case the receiver is enlisted
> > in the transaction by the sender when the first message in
> > the transaction is sent to the receiver. This does not commit
> > the receiver to anything, but allows them to determine the
> > transaction outcome or change it without having to send a
> > separate enlistment message.
> >
> > What's your opinion?
>
> This is another possible optimization  - but may not be flexible enough
> in some cases considered for BTP (the sender does not give the receiver
> the opportunity to decide who/what needs to enlist ).
>
> Today the simplest case involves the receiver agreeing to participate in
> the transaction (the transaction context, as part of the application
> message is sort of an invite to provider to enlist parties under its
> control in the transaction context provided). The receiver can do so in
> conjunction with the second message passed from receiver back to sender,
> the message can also optionally have state about the work requested -
> i.e. prepared statement if the work is completed (with whatever caveats
> it wants to impose). When done this way you still have two messages,
> outside the termination phase of the message exchange.

So one flow would be:

1. RM receives message sends ack
2. RM delivers one copy of message to application
3. application starts processing message
4. application completes processing message and sends back response
enlisting in transaction
5. application continues doing further work in transaction context
6. RM does delivery on behalf of application with resend

In this flow the application is doing step 3 outside the transaction context
and has to explicitly enlist in the transaction when it does step 4. Step 3
might have different results (can't deliver, can deliver, need more time,
can deliver but not in that date), so there are actually multiple pieces of
code that do step 4 and each has to enlist in the transaction.

Another flow would be:

1. RM receives message sends ack
2. TM picks up message, recreates transaction context, delivers to
application
3. application starts processing message and end back response
4. TM sends back response using the proper transaction context
5. RM does delivery on behalf of TM with resend

In this flow the TM takes care of enlisting the application so all the steps
are done in a transaction context and the TM can suprvise the application,
i.e. communicate internal failures as rollbacks, or communicate rollbacks to
the application. So the application doesn't have to be aware it's part of a
transaction aside from having synchronization entry points and allowing the
TM manage all the workload.

Since we're using X/Open XA, OTS, DTC and similar transaction protocols we
already have the infrastructure to do that, we just use legacy protocols
that are not supported over HTTP/SOAP, but it's given that we can easily
extend it to do BTP.

In this model the application is always enlisted in the transaction when it
receives a message as part of the transaction, which doesn't commit the
application to anything more than using a TM for processing all its
messages. The application is enlisted in the transaction whether or not it
communicates that to the sender, so this approach also works with the flow
you suggested.

But since the sender knows whether the receiver supports the transaction
protocol or not, the sender can now unilaterally enlist the receiver in the
transaction when it first sends a message to the receiver in the transaction
context. This will work because you have two TMs talking to each other and
doesn't complicate either application.

If the receiver application needs to use additional resources, since the TM
supervises its work it can elect to either explicitly enlist them with the
sender transaction, or do transaction interposing. In the case of
asynchronous messaging we are looking at reducing message exchange, so we
would always do interposing. I can't think of a case where you would want to
do asynchronous messaging without interposing.

arkin

>
> >
> > arkin
> >
> >
> >
> > >
> > > This is not making a case for not needing RM - simply there are
> > > many ways to "skin the cat" and the layers you require will be
> > > determined by the value of the interactions and their criticality
> > > to the business.
> > >
> > > Mark
> > >
> > >
> > > 	-----Original Message-----
> > > 	From: Assaf Arkin [mailto:arkin@intalio.com]
> > > 	Sent: Sat 12/21/2002 9:40 PM
> > > 	To: Peter Furniss; Patil, Sanjaykumar
> > > 	Cc: Www-Ws-Arch
> > > 	Subject: RE: Reliability is really two-phase (was RE:
> > > Reliable Web Services)
> > >
> > >
> > >
> > >
> > > 	Let's assume a typical and quite common scenario.
> > >
> > > 	A buyer sends a message to a supplier asking to buy a
> > > product. The buyer
> > > 	expects that it may take 8 hours before the supplier can
> > > indicate whether
> > > 	the purchase can be processed. The seller needs to check
> > > inventory levels to
> > > 	determine when the product can be shipped, update its
> > > production plan,
> > > 	validate the buyer's credit, etc.
> > >
> > > 	This is usually a fast process when the product is in
> > > inventory, or the
> > > 	inventory is constantly replenished, and the process is
> > > entirely automatic.
> > > 	It takes longer when the product has to be produced,
> > demand exceeds
> > > 	inventory, or the process is not entirely automatic (which
> > > as we all know is
> > > 	quite common in the business world).
> > >
> > > 	The request got lost in transit, and after eight hours the
> > > buyer does not
> > > 	hear back from the supplier. Since the buyer and supplier
> > > use a coordination
> > > 	protocol, they can both agree whether the product would be
> > > delivered. They
> > > 	need coordination to deal with a variety of cases, such as
> > > the buyer not
> > > 	aggering to the delivery date provided by the supplier, or
> > > agreeing to the
> > > 	delivery date for some items but deciding to remove other
> > > items from the
> > > 	purchase order.
> > >
> > > 	The buyer decides to still procure the product form the
> > > seller, and issues
> > > 	another purchase order. Since the buyer and supplier
> > > coordinate we get
> > > 	exactly one order (the buyer has determined that the
> > > previous order did not
> > > 	get through), no loss of consistency.
> > >
> > > 	All we lost are eight hours.
> > >
> > > 	Now, let's equip the buyer and supplier with an RM
> > > solution. The RM solution
> > > 	does not attempt to determine whether the order will be
> > > processed, when
> > > 	delivery will occur, etc. All it cares about is getting the
> > > request to the
> > > 	supplier.
> > >
> > > 	The RM expects to hear an ack after 30 minutes. Since no
> > > ack has been
> > > 	received, the RM tries to resend the message and the second
> > > message makes it
> > > 	to the supplier and the ack is recieved by the buyer's RM.
> > >
> > > 	Due to the unreliability of the transport the supplier has
> > > lost 30 minutes
> > > 	for processing the request, but is still able to respond to
> > > the buyer before
> > > 	eight hours have passed. The buyer does not have to attempt
> > > and resolve the
> > > 	situation after eight hours has passed, saving resources
> > > for both buyer and
> > > 	supplier and expediting the delivery of the product and any
> > > other process
> > > 	that depends on the delivery date being known. (For
> > > example, because the
> > > 	buyer is also a supplier and has to report back to its buyers)
> > >
> > > 	Would you say that RM has some added value?
> > >
> > > 	arkin
> > >
> > >
> > > 	> -----Original Message-----
> > > 	> From: www-ws-arch-request@w3.org
> > > [mailto:www-ws-arch-request@w3.org]On
> > > 	> Behalf Of Peter Furniss
> > > 	> Sent: Saturday, December 21, 2002 5:39 PM
> > > 	> To: Patil, Sanjaykumar
> > > 	> Cc: Www-Ws-Arch
> > > 	> Subject: RE: Reliability is really two-phase (was RE:
> > Reliable Web
> > > 	> Services)
> > > 	>
> > > 	>
> > > 	>
> > > 	> Sanjay replied directly to me, but his comments are worth
> > > 	> stirring into the
> > > 	> public
> > > 	> pot (and he's ok with that). My comments interspersed:
> > > 	>
> > > 	> > -----Original Message-----
> > > 	> > From: Patil, Sanjaykumar [mailto:sanjay.patil@iona.com]
> > > 	> > Sent: 21 December 2002 02:32
> > > 	> > To: Peter Furniss
> > > 	> > Subject: RE: Reliability is really two-phase (was RE:
> > > Reliable Web
> > > 	> > Services)
> > > 	> >
> > > 	> >
> > > 	> >
> > > 	> > Peter, would it be correct to say that - If
> > somebody wanted to
> > > 	> > deploy BTP entirely for achieving RM today, it should be
> > > 	> > possible. Perhaps, this may not be the best use of BTP,
> > > since the
> > > 	> > state alignment problems solved by RM is more of
> > > infrastructrural
> > > 	> > in nature, where as BTP, AFAIK, is primarily intended for
> > > 	> > business state alignment. Therefore could I say that -
> > > 	> > a> The use of BTP for business state alignment
> > makes low level
> > > 	> > state alignment and therefore RM unnecessary
> > > 	> > b> BTP technology is neutral to the nature of state
> > > alignment and
> > > 	> > therefore could be deployed for achieving purely
> > the goals of RM
> > > 	> > c> The BTP machinery is similar (superset!) to a typical RM
> > > 	> > solution and therefore does not introduce huge overheads for
> > > 	> > maintaining its flexibility (extensibility!) in supporting
> > > 	> > additional coordination functionalities.
> > > 	>
> > > 	> yes, that is exactly what I meant. BTP does not directly know
> > > 	> what "prepared" means, so it could just mean "it is
> > safely here".
> > > 	>
> > > 	> > I guess, many of us think that solving RM  is practically a
> > > 	> > "must", where as solving business level coordination in an
> > > 	> > efficient manner is still perceived as "future" (in
> > spite of the
> > > 	> > smart work you guys did in BTP :-). Therefore, the
> > argument of
> > > 	> > "BTP making RM unnecessary" to me is like selling cake
> > > when bread
> > > 	> > is in high demand. However, if my understanding as above is
> > > 	> > correct (i.e. BTP can solve RM today and if needed other
> > > 	> > coordination problems tomorrow), perhaps RM is the best
> > > launching
> > > 	> > pad for BTP.
> > > 	>
> > > 	> but if cake is as cheap as bread ...   :-)
> > > 	>
> > > 	> (cheapness might not be price exactly - manageability,
> > > availability
> > > 	> might be more significant)
> > > 	>
> > > 	> > Just a thought. May be I got the whole thing completely
> > > wrong, in
> > > 	> > which case please pardon me for taking your precious time.
> > > 	>
> > > 	>
> > > 	> >
> > > 	> > Have a good weekend.
> > > 	> >
> > > 	> > thanks,
> > > 	> > sanjay
> > > 	> >
> > > 	> >
> > > 	> > -----Original Message-----
> > > 	> > From: Peter Furniss [mailto:peter.furniss@choreology.com]
> > > 	> > Sent: Friday, December 20, 2002 3:56 AM
> > > 	> > To: Ricky Ho; www-ws-arch@w3.org
> > > 	> > Subject: RE: Reliability is really two-phase (was RE:
> > > Reliable Web
> > > 	> > Services)
> > > 	> >
> > > 	> >
> > > 	> >
> > > 	> > Ricky Ho replied to me:
> > > 	> >
> > > 	> > > Are you implying at point (j) that by using BTP, reliable
> > > 	> > > messaging is not
> > > 	> > > necessary ?  I think they are solving orthogonal
> > problem.  In
> > > 	> fact, BTP
> > > 	> > > without reliable messaging is not sufficient for
> > > conducting high money
> > > 	> > > value transaction in a reliable manner.
> > > 	> >
> > > 	> > Yes, I don't think RM is necessary with BTP. The BTP
> > > exchange means that
> > > 	> > the application work (e.g. money transfer) won't
> > happen unless
> > > 	> both sides
> > > 	> > agree that they understand and want to do it. If the
> > > pattern follows the
> > > 	> > typical sequence:
> > > 	> >
> > > 	> >     client requests transfer
> > > 	> >     server says it can do it, iff the client confirms
> > > 	> >     client confirms
> > > 	> >     server applies confirmation, and tells the
> > client it is done
> > > 	> >
> > > 	> > then you have a stronger mechanism than RM, which is
> > > concerned only
> > > 	> > with being a reliable postman.  (admittedly, if you map
> > > things in a
> > > 	> > particular way, the two end up becoming fairly close -
> > > certainly if the
> > > 	> > detailed application behaviour is fixed assuming an RM
> > > pattern, BTP
> > > 	> > can carry the identical semantics - though it has some extra
> > > 	> > flexibilities that RM would have difficulty with).
> > > 	> >
> > > 	> > Peter
> > > 	> >
> > > 	> >
> > > 	> > >
> > > 	> > > Rgds, Ricky
> > > 	> > >
> > > 	> > >
> > > 	> > > At 02:16 AM 12/18/2002 +0000, Peter Furniss wrote:
> > > 	> > >
> > > 	> > >
> > > 	> > > >The reliability requirement really means that you need
> > > 	> > > >the sort of mechanisms and exchanges of two-phase outcome
> > > 	> > > >(as in OASIS BTP).  "reliable messaging",
> > depending on the
> > > 	> > > >details of its mechanisms, is variously giving
> > less that it
> > > 	> > > >seems, or is just as complicated (and, in some
> > cases, both).
> > > 	> > > >
> > > 	> > > >
> > > 	> > > >To expand that assertion a bit:
> > > 	> > > >
> > > 	> > > >a) i'm assuming reliability can be defined as two parties
> > > 	> > needing to have
> > > 	> > > >a consistent view as to whether some work has or has
> > > not been done
> > > 	> > > >by one of them at the request of the other
> > > 	> > > >   [ this is the 0 or 1 case, and is the centre of
> > > state alignment -
> > > 	> > > >   where I change my view of the shared state
> > > because I know you
> > > 	> > > have/will]
> > > 	> > > >
> > > 	> > > >
> > > 	> > > >b) the critical feature is that one side accepts
> > > 	> > > >that the other side will make the definitive
> > determination as
> > > 	> > > >to whether the work is to be done; the deferring side
> > > 	> > > >agrees to accept/apply/follow that determination
> > > once it knows of it
> > > 	> > > >
> > > 	> > > >  [ which is the essence of the solution to the
> > two armies
> > > 	> > > problem - their
> > > 	> > > >problem was that neither side will make an unconditional
> > > 	> decision, but
> > > 	> > > >wants the other side to make an irrevocable decision as a
> > > 	> condition of
> > > 	> > > >its own]
> > > 	> > > >
> > > 	> > > >c) once the determination has been made, the
> > > repetition and recovery
> > > 	> > > >rules of the transaction protocol make sure the
> > > other side will
> > > 	> > > >know eventually
> > > 	> > > >
> > > 	> > > >d) you normally want to know that the application
> > > has really done
> > > 	> > > >the work. In some cases, it may be sufficient to
> > know that
> > > 	> > > >the work will eventually be done (e.g. it's been
> > dropped on a
> > > 	> > > >reliable queue) - but that means that either there is no
> > > 	> > > >comeback or any comeback is a whole new activity.
> > > 	> > > >
> > > 	> > > >e) the "simple" ack approach actually requires some extra
> > > 	> > > >messages to avoid one or both sides having to
> > remember the
> > > 	> > > >request (or some identification on it)
> > indefinitely or have
> > > 	> > > >a complicated set of timeout rules as to when
> > they can forget
> > > 	> > > >things. (and that's before we worry about
> > surviving crashes)
> > > 	> > > >
> > > 	> > > >f) reliable messaging (including things like HTTPR) are
> > > 	> > > >distinguished from two-phase outcome only by
> > what is counted
> > > 	> > > >as the "decision" - it's "message received", not
> > > "work is/will
> > > 	> > be done".
> > > 	> > > >The systems have to store similar information/identifiers
> > > 	> > > >and follow similar rules as to when to persist and
> > > 	> > > >delete this information. [ in other words, it's not
> > > really simpler
> > > 	> > > >to just use reliable messaging ]
> > > 	> > > >
> > > 	> > > >g) some of the scenarios differ from the classic
> > > 	> > > >two-phase commit exchanges in that the sender of
> > the first
> > > 	> > > >message is the one that defers to the other
> > side's decision.
> > > 	> > > >(classic two-phase is client asks server to defer to the
> > > 	> > > >client's decision). This has some impact on how the
> > > 	> > > >relationship gets established, but doesn't significantly
> > > 	> > > >affect what happens later (in terms of retries,
> > persistence,
> > > 	> > > >recovery sequences).
> > > 	> > > >
> > > 	> > > >h) expel from your mind any assumptions about
> > how the party
> > > 	> > > >that is waiting on the other's determination/decision is
> > > 	> > > >holding itself able to obey. (two-phase commit does *not*
> > > 	> > > >imply two-phase locking). It may hold the information in
> > > 	> > > >a distinguished interim state (outbound buffer,
> > > uncleared funds,
> > > 	> > > >marked as reserved). It may completely perform
> > its work and
> > > 	> > > >retain a means of un-performing it. It may just
> > > check it could
> > > 	> > > >perform its work and remember what it must do.
> > > 	> > > >
> > > 	> > > >i) the transaction mechanisms actually allow for
> > more complex
> > > 	> > > >arrangements - the coordination role can be
> > > distinguished from
> > > 	> > > >the resource-holding parties on each side, and
> > there can be
> > > 	> > > >more than two such parties. But for comparison
> > with reliable
> > > 	> > > >messaging, we can consider all the roles to be
> > on one side or
> > > 	> > > >the other, and consider only a single bilateral
> > relationship.
> > > 	> > > >
> > > 	> > > >j) using a loosely-coupled transaction mechanism
> > > like BTP means
> > > 	> > > >the application code doesn't have to get tangled up
> > > in the recovery,
> > > 	> > > >repeats etc. Setting of timeouts and the like becomes a
> > > 	> > > >configuration question (possibly even a dynamic
> > configuration
> > > 	> > > >question if you really want to).
> > > 	> > > >
> > > 	> > > >k) a two-phase outcome exchange doesn't really seem
> > > to count as
> > > 	> > > >"orchestration" or "choreography" as I
> > understand those. It's
> > > 	> > > >just a matter "please do this", "I can do this",
> > "I can't do
> > > 	> this" etc.
> > > 	> > > >Any compensation/counter-operation/reversal is
> > > delegated to the
> > > 	> > > >party that has to do the reversal, rather than
> > having to be
> > > 	> > > >explicitly exposed as a counter-operation
> > distinctly accessed
> > > 	> > > >by the other side.
> > > 	> > > >
> > > 	> > > >
> > > 	> > > >That's enough for now - I'm probably still
> > obscure through
> > > 	> > > >brevity, but the message is long enough already.
> > > 	> > > >
> > > 	> > > >Peter
> > > 	> > > >
> > > 	> > > >------------------------------------------
> > > 	> > > >Peter Furniss
> > > 	> > > >Chief Scientist, Choreology Ltd
> > > 	> > > >
> > > 	> > > >    Cohesions 1.0 (TM)
> > > 	> > > >    Business transaction management software for
> > application
> > > 	> > coordination
> > > 	> > > >
> > > 	> > > >web: http://www.choreology.com
> > > 	> > > >email:  peter.furniss@choreology.com
> > > 	> > > >phone:  +44 20 7670 1679
> > > 	> > > >direct: +44 20 7670 1783
> > > 	> > > >mobile: +44 7951 536168
> > > 	> > > >13 Austin Friars, London EC2N 2JX
> > > 	> > >
> > > 	> >
> > > 	>
> > >
> > >
> > >
> >
> >
>
Received on Monday, 23 December 2002 13:40:52 UTC