- From: Assaf Arkin <arkin@intalio.com>
- Date: Sun, 26 Jan 2003 13:45:52 -0800
- To: "Walden Mathews" <waldenm@optonline.net>, "Miles Sabin" <miles@milessabin.com>, <www-ws-arch@w3.org>
> > That's a bad proposition. I would like to receive at some point (say 8 > hours > > later) a message confirming whether the delivery would be made or not. > > That's how I achieve reliablity of the application, and I > cannot think of > > any other way. > > I'd rather interact with an application that can tell me > immediately (i.e., > fast enough for synchronous exchange) what the state of supply is, or a > piece of meta-supply-state that means "not sure if we can fulfill". I would rather interact with an application that delivers what I want within a specified time frame. When I go to service my car I always ask for pricy high quality parts. My mechanic is not a warehouse, they don't stock on everything, sometimes they have it, sometimes the parts shop next door has it, sometimes they have to order it from one of their numerous suppliers. When I set an appointment I give them time to call their suppliers and decide whether they can service me tomorrow with that part, or service me tommorrow with some other part and the next day with the part I want. I know they can always service me next month with that part. Because I give them a few hours to determine when they can get the part I can get the part and the best service. But as they say, your milage may vary. I think the big divide here is that I have worked for companies that had outstanding suppliers in both time to delivery, quality of products, not messing up shipments and not charging overprice. And these suppliers uses asynchronous messaging, so even if they go home to sleep at 5pm, you can send a request at 4am in the morning and get it addressed the next business day. And that worked well for both parts. I understand your impatience, but for most people waiting a few hours to get back a reply about when delivery will happen seems acceptable. > Remember > this RM stuff is supposed to be in service of real and robust business > applications. That being the case, it does no good to assume badly > designed applications as the basis (requirements) for RM features. On the contrary. I am routinely pointing out to the fact that reliable applications use a variety of coordination protocols and that RM plays an important fact in many of these protocols. If you have a badly designed application you have a badly design application. > Note that your paragraph above can be summarized by saying that > reliability of the application is a matter of State Transfer. True? Definitely not. I would rather think of reliability as being the liklihood of something (hardware, software, process) continuting to function over a given period of time under the specified conditions. How you address reliability is a different issue, and state transfer is one of the concepts you could use to address reliability. But state transfer is not application reliability just like RM is not application reliability. > > In a non-perfect world messages may be lost. The fact that a message has > > been lost means I will have to wait 8 hours to determine that. This is a > > lousy failure detection algorithm. Why wait 8 hours? > > Note that this problem has been "swept" by the more important > problem above, and its solution. Let's optimize and solve this problem > only once, at the application level. Do you favor optimization? Let's put it another way. You send an e-mail to this mailing list. That e-mail goes through three hops to get here. One of the nodes is offline. Your e-mail gets discarded. I assume you are fine with that. I would much prefer that my e-mail, if one node is down to simply route the message through a different node. I don't care which path it takes as long as it gets there. I am asking my e-mail server to do RM. Note that I haven't talked about resend, timeouts, etc. I just ask that it delivers. Which approach you use is up to you, but I would rather use SMTP than UDP to send my e-mails. Which one would you choose? > > Let's say I do synchronous delivery using TCP. I start sending > the message > > and near the end the TCP connection drops. I can say "fine, I think the > > message got there", or I can say "oh, oh, message loss". In the > first case > I > > would wait 8 hours to determine whether the message was > delivered/processed. > > In the second case I am more responsive, I can react immediately by > openning > > another connection and sending it again. I am doing RM. > > Looking back, the 8 hours of your use case above is the time allowed > for the service application to asynchronously advise of product supply > state. This, therefore, is apples and oranges again, and even your > 8 hours assumption above fails, I think, because the supply application > may not know about you at all. I am assuming some common sense here. Either the supplier doesn't have to know about me, or the supplier does and does know about me. However, I can inform the supplier who I am exactly once and then keep sending purchase orders routinely. And I assume the supplier could use the return address to tell me status of order, "I don't know who you are", "I know who you are but prefer not to sell you anything, thank you very much, please don't come back". I don't think I'm inventing anything here, just reflecting on how I've seen businesses work. > > Now let's say the receiver decides not to process messages as they come, > > instead it queues them for later processing. The queue is not persisted. > If > > the TCP connection drops the message never gets to the queue. > It will not > be > > delivered, so there's no ack. The sender needs to retry again. If the > > message gets into the queue it's acked. It will possibly delivered. > > I think by "receiver" here you mean the supply application, not some > server side of the RM machine. But you're talking about messages being > delivered and (I think) products being delivered, and it's hard to tell > which is which. By receiver I mean server side on the RM machine. In RM we distinguish between: 1. Sending a message (the act of creating and firing a message, as opposed to sending it over the wire) 2. Receiving a message (the act of getting a sent message, as opposed to receiving it over the wire) 3. Delivering a message (the act of forwarding the received message to the application) A sender sends each message exactly once. It could be sent multiple times over the wire, e.g. for resending. From the perspective of RM it is sent exactly once, how you resend is protocol specific (some implementations resend on demand, some use timeouts, some just keep resending all the time). A receiver may receive a message multiple times and in any order. That allows any medium to be used, some mediums would duplicate messages. You simplify the medium if the medium knows nothing about the message and can't detect duplication, but the RM does (since it does the sending and receiving) and can remove duplication. A receiver delivers the message exactly once, so the application can be built with the assumption that each message would be delivered exactly once. RM is conceptual, so if you build that logic into your software your software combines application and RM responsibilities. RM simply says that if a message is sent then any "correct process" will eventually deliver. To qualify, eventually doesn't mean "indefinite period of time", though it may sound like this. If the message expires in 5 minutes then eventuall is in 5 minutes, if the message cannot be delivered in 5 minutes the process is no longer correct. If it doesn't deliver the process is not correct either. In other words, if I sent a message for delivery within 5 minutes, don't get a ack, I assume the process is incorrect and did not deliver. Any coordination protocol takes that into account in building a reliable application solution. Timeouts are thus the primary means for fault detection. > If a supply application receives my order for goods but does not > store it safely, then we are once again talking about a brain dead > application, and no amount of RM is going to fix that. Let's avoid > talking about brain dead applications, okay? Agreed. > > The sender cannot distinguish between a message that was not > delivered and > a > > message that was not processed. So for the sender the fact that the > message > > has arrived at its destination fully intact warrants an ack. > > I think you're saying that the sending (requesting) application wants > an acknowledgment that a message was delivered. I think you're wrong > about what it wants. It wants a state transfer. An end-to-end thingy. I am saying that acknowledgment of delivery could occur way before state transfer. You probably send me this e-mail and expect a reply within 5 minutes. I would assume the reply is the "state transfer". What if I just went to see a movie? Now let's say you had to ways of sending me a message. You could do an HTTP operation, but then I'll have to be online. So you need to keep doing that until I come back online, which could be the middle of the night in NYC, or you could just give up. You could also use SMTP. Send & forget knowing it will get to me and I will read it when I go online and reply to you. What if the message gets lost? You could wait five days, look at all the people who never replied to you and resend. Or you can just let the SMTP server handle that, since it has node-to-node nacks (what is not nacked in a given time frame is by default acked, not entirely reliable but better than nothing). Which option would you choose to continue this conversation? > I want to point out that it's quite feasible for applications -- clients > and servers -- to conduct their business asychronously while at the > same time communicating synchronously. Or else what are telephones > all about? "I'll get back to you on that" is a synchronous reply > signalling a business decision to postpone part of the business > process. Has there been an assumption that asynchrony in business > process implies asynchrony in communication protocol? Maybe > we need to decouple there. Ever heard of voice mail? Faxes? Pagers? Blueberry? If synchronous communication works so well, why bother with voice mail. Maybe it's the "I don't want to keep calling you every five minutes until you get out of a meeting I don't know about, I'll just leave you a voice mail". Most large scale enterprise systems (and all the ones I know of) use asynchronous communication at various points. Not just. But it would be great if we had a solution that works the way people do business. I am not saying people should always use voice mail, I am just saying voice mail should be an option. I would hate to think how you could run a business without voice mail. > > The receiver can employ two strategies to improve its QoS. The receiver > can > > either make sure it never fails, or it can persist messages. Which > strategy > > it uses is up to the received. But statistically the one that chooses > > persistence is going to give a higher QoS and those remain in business > > longer. Queuing is optional just like friendly customer support is > optional. > > In other words, to be reliable, a service must preserve state so that > it can later transfer it, state transfer being the equivalent to > end-to-end > communication. If by "queueing" you mean persistence of state, then > I find your last sentence above curious. It seems to say that application > reliability is optional. In the context of this discussion, it shouldn't > be. By queuing I mean persistence of message. If you can't process immediately you can either hold everything in memory (assuming software never crashes) or store it in a queue. You stand in line at the bank to make a non-ATM transaction. The teller in front of you all of a sudden needs to go. They don't just say "everyone in this line, please go out, come back in". They route the line to the next available teller. That's queuing 101. Queuing helps you build fault tolerant systems because eventually all messages gets delivered (an RM property). > Application reliability (reasonably designated above) is the real > requirement. As a developer of web services, I'd rather find that > subject* treated directly in the architecture document than find a > section on "RM", because the latter is not a full substitute for the > former, and because its depth, complexity and challenge are a > distraction from my real goal. Summary: I think the focus on RM > will diminish application reliability instead of fostering it because > developers will tend not to believe that such a complex undertaking > is not a full solution. I agree. I think it is very important for the WS arch document to discuss application reliability and separately messaging. I do not claim that RM solves any application reliability problem per se. But from the perspective of WS, the solution involves messaging. The WS doesn't talk about database integrity, database logging, exception catching, or the million other things you need there to get reliability. They talk about the messages that services exchage as part of a coordinated message exchange that strives for reliability. Granted, these coordinations would often utilize RM as a way to build a better coordination protocol, with, and let me repeat that again, RM doing its part to address delivery of messages. So the WS arch document needs to first identify that such coordination is a requirement and should be addressed, then point out to the fact that such coordination may elect to use RM for addressing delivery issues, which would put RM in the right context. arkin > > * Web Service Reliability > > Walden
Received on Sunday, 26 January 2003 16:47:14 UTC