RE: Proposed text on reliability in the web services architecture from Assaf Arkin on 2003-01-21 (www-ws-arch@w3.org from January 2003)

From: Assaf Arkin <arkin@intalio.com>
Date: Mon, 20 Jan 2003 17:08:09 -0800
To: "Walden Mathews" <waldenm@optonline.net>, "Peter Furniss" <peter.furniss@choreology.com>, "Champion, Mike" <Mike.Champion@SoftwareAG-USA.com>, <www-ws-arch@w3.org>
Message-ID: <IGEJLEPAJBPHKACOOKHNAEBLDBAA.arkin@intalio.com>
> > I think we've already established that RM tries to solve a very specific
> > problem. I don't want to keep discussing RM in every possible scenario,
> > because I don't have any evidence you will need to use it in every
> possible
> > scenario. I haven't used RM in every single application, only in those
> > applications that need it. I don't want it to sound like I'm preaching
> what
> > I don't practice.
>
> That certainly makes sense. I'd be interested in hearing about some
> web services applications that used RM and how that worked.

Every time you use a MOM to send a message and expect it to be delivered to
the other side. Which is something I see quite a lot in enterprise
applications. These are basically WS that do what we did before with EDI,
FTP or JMS, but the interface is made to use SOAP or TRP. I haven't done
HTTP straight through yet, but that's also possible (HTTPR).


> > And since we both agree RM doesn't make all your problems go
> away, I don't
> > want to keep discussing it in the context of a "the one and only
> solution".
> > I know some people would like to think they have "the one and only
> > solution". Experience has taught me that for every problem there is a
> > solution, and even if you can find the one and only solution for a
> specific
> > problem, it's still not "the one and only solution".
>
> Agreed.  Certainly not "the only one".  As to whether it can be "the
> one" (meaning something sufficiently general to be widely applicable),
> or even "the few" ... this is where I still have reservations.
> Just so you
> know where I'm coming from.

Don't confuse "reliable ..." with "reliable messaging". Reliable messaging
solves a very specific problem of getting the message through, it doesn't
make the application reliable it makes the messaging reliable. Which of
course is redundant in many situations. And if you are interested you can
always run a search on Google to see the formal definition of reliable
messaging.


> > So I want to refocus the discussion with the following permise:
> >
> > - Consider the use of RM for applications that need RM and ignore
> > applications that do not need RM
> > - Consider the use of RM to solve the reliable messaging
> problem and don't
> > discuss the use of RM to solve any other kind of problem
>
> I'm interested in analysis of WS applications that "need RM".

I'll let others point to other examples. But ours is based on the time
independence between sending of message (from sender) and delivery of
message (in receiver). Basically whenever you put a queue in between sender
and receiver (in either place) you need RM.

No observe that in some situations (in fact, the last place we used it that
was true 100% of the time, but that's not always the case) you never lose
messages in transit. That's because it works like HTTP over TCP. But nothing
prevents the queue from just ignoring messages, losing messages or simply
delivering them in any arbitrary order. Except that the queue (in this case
one of the leading MOMs in the market) is an RM, so messages placed are
never lost or delivered out of order.


> I'm truly sorry to keep quibbling over these examples, but it seems
> confused.  Above you're saying TCP doesn't correct sequencing, and
> here you saying it does.  If I send two messages on the same
> tcp connection, they will come out the other end in the same order as
> sent, or not at all.  Notwithstanding the fact that _message_ boundaries
> and _segment_ boundaries can be different.  Right?

I'll try to rephrase it.

TCP does sequencing for packets. TCP does not by itself do sequencing for
messages.

If you happen to send all messages over the same connection in sequence then
... but what if you send messages two minutes apart and the connection drops
and you open a new one? Do you start sending all the messages from the
beginning? Or is that never going to happen so it's not a problem?

And what if you send all the messages in sequence over a single TCP
connection, they all get to the other side, gets queued and then the queue
decides to deliver them in reverse order, or just drop every second message?

You can't blame the queue unless you expect the queue to support the RM
semantics for delivery. But if there are no RM semantics for delivery, the
queue is fulfilling is meeting its requirements.


> The reason I quibble about this is that if you need RM atop a TCP
> connection, and RM and TCP are congruent* as processes dealing
> with ordered segments on a network, then by a sort of crude induction,
> you can justify an arbitrary number of "needed" layers of this sort of
> thing.  Do you see my concern?  And an architecture's job is to sort
> out this kind of tangle.

I definitely see your concern. If I thought the problem could be solved by
letting TCP deal with it, then I would agree that RM is just another way of
doing things and I would also say that it's redundant.

Let's say you send ten messages over the course of five hours, and they all
have to be delivered in order. After the first five messages the connection
drops. At this point you have two separate independent sequences: messages
1,2,3,4,5 in the first connection, and messages 1,2,3,4,5 in the second
connection. And the other application may process the first set before it
processes the second one.

RM says that all ten messages will be delievered in the same original
sequence. How you do that is immaterial.

Maybe you send all ten in one burst of TCP messages, so if the connection
drops the receiver discards all the messages and the sender simply sends all
ten again.

Maybe you send them in two separate connections, but the receiver notices
the sequence and orders them accordingly.

Maybe you send them in any number of connections in any order, but the queue
on the other side delivers them in order.

What RM defines are:

1. Semantics that express these ordering constraints

What an RM implementation does is:

1. Make sure it works one way or the other and it can do it in five
different ways

What an RM implementation gives you is:

1. Semantics assured regardless of how the implementation decides to provide
them

What WS RM gives you is:

1. Way to express these requirements so an intermediary can process them
decoupling the app from the implementation

Anything else (coordination, is alive, etc) is also very useful but it's a
whole different problem.

> * meaning that structurally and operationally they are doing
>    the same job, perhaps under different names


> You know, you just made me realize that "one size" products, if
> that's really what they were, could only work if the client was willing
> to change to fit them. ;-)

Of course if you chop your body off, then a hat is all you need ;-) But if
you have a warddrobe then a hat is also something you need. You can put a
t-shirt on your head, but I think a hat would look better.


> I'm assuming you have a collection of problem cases you're using
> for this evaluation?

I have a collection of problem cases and they all require extremley reliable
applications. And so I also have a collection of solutions. RM is not
enough, if I convienced myself that RM solved the reliability problem then I
would be selling a "sometimes work solution". On the other hand, without RM
I would also be sending a "sometimes work solution". So it's an essential
part of the solution, but it's part of the solution.


> > There are three ways to prove that a solution is worthless:
> >
> > 1. Look at the problem and discover that the solution does not solve it
> > 2. Look at a different problem and discover that the solution does not
> solve
> > it
> > 3. Look at a bigger problem and discover that the solution does
> not solve
> > all of it
>
> Point taken.  What about the flip side of the coin?  What are some
> of the ways of asserting, incorrectly, that X is "a solution"?  In terms
> of an architecture, what should be the criteria for disallowing these?

First you need to define the problem.

For example, the problem could be: given that m is sent before m', the
application should process m before m'. If the solution can make this, it
could solve this problem.

Second you need to decide whether the solution is generic.

Ordering m and m' is generic. Constructing m is not. So constructing m (as
opposed to m') is an application issue, but ordering is something you can
componentize.

Third you need to determine if the solution prevents you from doing other
things.

RM works well for asynchronous messaging. But if RM prevents me from using
synchronous messagings when I want to, that's not a valid option.

Fourth you need to determine if the solution can work well with other
solutions.

Even a perfect RM is not going to solve my coordination problems or my
security problems. Do it interfere with coordination protocols? security
protocols? choreography? arbitrary message schemas? routing? pub/sub?

arkin

>
> Thanks,
> Walden
Received on Monday, 20 January 2003 20:09:54 UTC