- From: Walden Mathews <waldenm@optonline.net>
- Date: Sun, 26 Jan 2003 11:47:44 -0500
- To: Assaf Arkin <arkin@intalio.com>, Miles Sabin <miles@milessabin.com>, www-ws-arch@w3.org
----- Original Message ----- From: "Assaf Arkin" <arkin@intalio.com> To: "Miles Sabin" <miles@milessabin.com>; <www-ws-arch@w3.org> Sent: Wednesday, January 22, 2003 5:52 PM Subject: RE: Proposed text on reliability in the web services architecture > > Here's my view of thing (abriged version). > > In a perfect world where messages are never lost, one way flows do not > assure any level of reliability. If I send a request to buy products X/Y > which I expect to be shipped within 2~3 weeks, the message gets processed > but the products are not available (book out of print), I have to wait 3 > weeks to determine that I will not receive my product. You're mixing apples and oranges here. In a perfect world where messages are never lost, message reliability is axiomatic and assumed. But above where you'd be waiting 3 weeks because while messaging was perfect, the supply chain wasn't, the thing that's missing is app level coordination, not reliability of messaging. Apples and oranges. Applications and orangutans. Whatever. > > That's a bad proposition. I would like to receive at some point (say 8 hours > later) a message confirming whether the delivery would be made or not. > That's how I achieve reliablity of the application, and I cannot think of > any other way. I'd rather interact with an application that can tell me immediately (i.e., fast enough for synchronous exchange) what the state of supply is, or a piece of meta-supply-state that means "not sure if we can fulfill". Remember this RM stuff is supposed to be in service of real and robust business applications. That being the case, it does no good to assume badly designed applications as the basis (requirements) for RM features. Note that your paragraph above can be summarized by saying that reliability of the application is a matter of State Transfer. True? > > In a non-perfect world messages may be lost. The fact that a message has > been lost means I will have to wait 8 hours to determine that. This is a > lousy failure detection algorithm. Why wait 8 hours? Note that this problem has been "swept" by the more important problem above, and its solution. Let's optimize and solve this problem only once, at the application level. Do you favor optimization? > > Let's say I do synchronous delivery using TCP. I start sending the message > and near the end the TCP connection drops. I can say "fine, I think the > message got there", or I can say "oh, oh, message loss". In the first case I > would wait 8 hours to determine whether the message was delivered/processed. > In the second case I am more responsive, I can react immediately by openning > another connection and sending it again. I am doing RM. Looking back, the 8 hours of your use case above is the time allowed for the service application to asynchronously advise of product supply state. This, therefore, is apples and oranges again, and even your 8 hours assumption above fails, I think, because the supply application may not know about you at all. > > Now let's say I use queues. I put the message in a queue and I wait 8 hours > for a response. The MOM picks the message from the queue, sends it, TCP > connection drops, if say "oh, well, life goes on". I wait 8 hours and get no > response. What if the MOM would simply retry to send the message again? The > queue is fulfilling the RM responsibility. Yes it is. > > Now let's say the receiver decides not to process messages as they come, > instead it queues them for later processing. The queue is not persisted. If > the TCP connection drops the message never gets to the queue. It will not be > delivered, so there's no ack. The sender needs to retry again. If the > message gets into the queue it's acked. It will possibly delivered. I think by "receiver" here you mean the supply application, not some server side of the RM machine. But you're talking about messages being delivered and (I think) products being delivered, and it's hard to tell which is which. If a supply application receives my order for goods but does not store it safely, then we are once again talking about a brain dead application, and no amount of RM is going to fix that. Let's avoid talking about brain dead applications, okay? If the application receives my order and makes its state retrievable, then I can retrieve that state at any time. This constitutes application level reliability. Any time I can't retrieve state because of underlying comms breakage, I can distinguish that from bad application state because it looks like a time-out, not a missing resource or a resource in the wrong state. While this doesn't mean goods will be delivered, it means we know what's broken -- the best that can be accomplished in the name of distributed applications reliability. > > The sender cannot distinguish between a message that was not delivered and a > message that was not processed. So for the sender the fact that the message > has arrived at its destination fully intact warrants an ack. I think you're saying that the sending (requesting) application wants an acknowledgment that a message was delivered. I think you're wrong about what it wants. It wants a state transfer. An end-to-end thingy. > > The receiver takes two hours before it can start processing the message. > During the two hours it may crash, message is lost. This is equivalent to > message not being processed for any other reason. But, it takes six more > hours to find this out. So the receiver has a lousy QoS. I will elect not to > do business with this supplier. Would you please elect not to design RM systems for it also? It's a waste of brain cycles. We're supposed to be fostering best practices. (Okay, I already lectured on this above, so no more.) I want to point out that it's quite feasible for applications -- clients and servers -- to conduct their business asychronously while at the same time communicating synchronously. Or else what are telephones all about? "I'll get back to you on that" is a synchronous reply signalling a business decision to postpone part of the business process. Has there been an assumption that asynchrony in business process implies asynchrony in communication protocol? Maybe we need to decouple there. > > The receiver can employ two strategies to improve its QoS. The receiver can > either make sure it never fails, or it can persist messages. Which strategy > it uses is up to the received. But statistically the one that chooses > persistence is going to give a higher QoS and those remain in business > longer. Queuing is optional just like friendly customer support is optional. In other words, to be reliable, a service must preserve state so that it can later transfer it, state transfer being the equivalent to end-to-end communication. If by "queueing" you mean persistence of state, then I find your last sentence above curious. It seems to say that application reliability is optional. In the context of this discussion, it shouldn't be. Application reliability (reasonably designated above) is the real requirement. As a developer of web services, I'd rather find that subject* treated directly in the architecture document than find a section on "RM", because the latter is not a full substitute for the former, and because its depth, complexity and challenge are a distraction from my real goal. Summary: I think the focus on RM will diminish application reliability instead of fostering it because developers will tend not to believe that such a complex undertaking is not a full solution. * Web Service Reliability Walden
Received on Sunday, 26 January 2003 11:47:54 UTC