- From: Burdett, David <david.burdett@commerceone.com>
- Date: Sat, 14 Dec 2002 11:17:53 -0800
- To: "'Ricky Ho'" <riho@cisco.com>, "Burdett, David" <david.burdett@commerceone.com>, "Burdett, David" <david.burdett@commerceone.com>, www-ws-arch@w3.org
- Message-ID: <C1E0143CD365A445A4417083BF6F42CC053D152E@C1plenaexm07.commerceone.com>
Ricky See comments in line below marked with <DB2></DB2> David -----Original Message----- From: Ricky Ho [mailto:riho@cisco.com] Sent: Friday, December 13, 2002 6:13 PM To: Burdett, David; Burdett, David; www-ws-arch@w3.org Subject: RE: Different Levels of Reliable Messaging Thanks David, see my followup questions (embedded) >The "ack" doesn't need to be per-message based. I can send an ack for a >bunch of message (of course, sequence number is used). ><DB>Agreed, but now you are adding in an extra level of complexity (sequence >number) that often won't be needed. What I would suggest is that you split >this into another two levels: >1. Sequencing Support. This is a protocol, built on top of reliable >messaging that ensures that messages arrive in the sequence they were sent. >2. Reduced Frequency Acknowledgement Messages. You could then vary the >reliable messaging protocol so that a request for an acknowledgement is >every so many messages and if it is not received, then corrective action is >taken. ></DB> <Ricky> I was presuming that sequence ordering to be part of reliable messaging. Seems like you consider this as a separate layer. </Ricky> >The "time expiry" is unreliable because clocks may be unsync. ><DB>Absolutely right. > >The "cheap", but as you say inaccurate way to do this is to set and compare >"expires at" using a local system clock. The fact that it is an >approximation to the true time is often not a big issue especially if you >are doing end-to-end acks where the time between sending a message and when >it expires is long compared to the clock accuracy (e.g. a day). Even so, it >is probably good practice that Reliable Messaging solutions take this >uncertainty in the accuracy of the time into account and extend the "expires >at" to some time beyond the nominal expiry time. > >If time accuracy is *fairly* critical, then the sender and receiver of a >message SHOULD agree to keep their clocks accurate using, for example, >protocols such as the Network Time Protocol. If accuracy is *really* >critical then you can include in the message the accuracy to which the >system at the destination MUST keep its clocks. If the system does not keep >its clocks accurate or cannot keep them accurate enough, then the >destination should reject the message and not process it.</DB> <Ricky> Maybe I misunderstood the purpose of expiration time. I guess your purpose of time expiry mechanism is for reducing the "in-doubt" condition. So if A send a message to B which is valid within T minutes. And A doesn't receive an ACK from B. So A keep resending but still doesn't get back the ACK after (T+10) minutes. Can A at this point simply gives up and conclude that the message is undelivered ? All I try to say that "A cannot draw that conclusion". Sorry, I agree this is irrelevant with the clock sync problem. So I see the expiration time is purely an application level semantic (e.g. you send a bid response which is valid within one day). I don't see what role the expiration time play at the RM level. I must be missing something here. </Ricky> <DB2>I thinkj the expiration time is important for RM, and here's why. All RM is based waiting for an ack and repeatedly resending the original message if you don't get one. The problem is when do you stop resending and give up. At some point you have to stop but when. There are two ways of doing this. 1. Stop after a fixed number of retries (which is what ebXML MS does), or 2. Stop only after the message can no longer validly be processed. The problem with the first approach is that if you say stop sending after 3 retries (i.e. 4 sends of the message in total), then it is still quite possible that, if the destination system was down, and then came back up it could pick up the message and process it - this is quite normal behavior. You could then get into the situation where: 1. The sender sent the message be gave up resending after, say 20 minutes as no reply was received, then 2. The sender reports to the application that deliver failed 3. The destination restarts, finds the message, sends the ack and starts processing it. 4. The sender receives the ack and has to tell the application that the message for which it had just reported a delivery failure had actually been received and was processed - not a desirable outcome Alternatively by specifying a time out and basing the retries on that you know, with a high degree of certainty, that even if the message is picked up, it won't be processed and therefore you are much less likely to have to report the a delivery failure and then have to reverse it. The question is how do you decide what the timeout value should be. There are again two ways of doing this: 1. Use a value that is driven by the application - i.e. it is a business value, or 2. Use a value that is determined based on speed of the transport protocol and therefore how long you expect it to take to get the ack. I think that it is always a good idea to use a business driven value if one is available, but it really is an implementation decision. </DB2> >I don't think there should be a step 4 in LEVEL 3. Step 3 should say "Have >you receive the message ? If not, forget the message afterwards" ><DB>I don't think you can always say this. For example if you want to place >an order and there is only one supplier, then even if you message failed, >you might want to resend it if the connection became available later. In >this case, the conent/payload/body of the message might be identical but in >other ways it was a completely new message.</DB> <Ricky> What I'm trying to prevent is the situation that the request message arrives the receiver after the query (so the receiver respond: "I haven't got it"), but before the "forget message" get there. In this case, the message has been delivered, but the sender think it hasn't. Going back to your example, you should send a query to the supplier "Have you receive my purchase order with message id=12345 ? if you haven't, ignore that message if it arrives later". If you get back an answer "NO", resend your same purchase order with a new message id=98765. However, if you send a separate "forgot" message after you receive a "NO". Then it is possible that the receiver get 2 purchase order (one with message id = 12345 and the other with id = 98765). </Ricky> <DB2>There is actually a little mistake in Level 3 as I described it which avoids the problem you describe. Basically you only attempt a recovery *after* you have given up, and you only give up when the message has expired. In this case, even if the message arrived after the query, it should be rejected as it arrived too late.</DB2> >I think LEVEL 5 should be done at the transaction layer, below >choreography, but above reliable messaging. Using some 2-phase-interaction >style like BTP. ><DB>Quite possibly. The problem with two phase commit is the action you take >when you geet a failure (i.e. a rollback) may not always the right one and >often it can be impossible to do. For example, if you want to roll back a >payment, but the payment has already gone to the bank, then its to late. You >have to do a reversal, or refund instead. Both of these would leave a trace >in the records of what happened.</DB> <Ricky> Of course, you can always handle exception at the application level, which can recovered from a partial failure situation is a very application specific manner. However, this can complicates the application flow because it mixes the normal flow with exception handling logic under different failure scenario. The beauty of transaction processing is that application can encapsulate multiple activities within a transaction block and safely assume everything will automatically undone. In other words, the application doesn't need to worry about all failure situations. Lets look at a simple case where A is sending a "money transfer request" to B, which sends a "money deposit request" to C as well as another "money withdrawal request" to D. Let me illustrate the flow based on a 2-phase handshaking. 1) A sends "transfer" to B, and wait for "Prepared-ACK-transfer" from B 2) B sends "deposit" to C, and wait for "Prepared-ACK-deposit" from C 3) B sends "withdrawal" to D and wait for "Prepared-ACK-withdrawal" from D 4) After B got back all the "Prepared-ACK" from C and D, it send back the "Prepared-ACK-transfer" to A 5) A sends "commit" to B, and wait for "Commited-ACK-transfer" from B 6) B sends "commit" to C, and wait for "Commited-ACK-deposit" from C 7) B sends "commit" to D and wait for "Commited-ACK-withdrawal" from D 8) After B got back all the "Commit-ACK" from C and D, it send back the "Commited-ACK-transfer" to A </Ricky> <DB2>What you describe in this example is a Business Process. It is NOT, in my opinion, reliable messaging as you make the return of one ack dependent on the receipt of two other acks. The bottom line is that you can only do transaction processing if you KNOW that complete rollback of the state at the sender and receiver is possible. Sometimes it is, and sometimes it isn't which is why you have to determine how you do the recovery at the application level.</DB2> By the way, you have raised some very good points David ! Best regards, Ricky
Received on Saturday, 14 December 2002 14:17:37 UTC