RE: Proposed text on reliability in the web services architecture from Assaf Arkin on 2003-01-23 (www-ws-arch@w3.org from January 2003)

From: Assaf Arkin <arkin@intalio.com>
Date: Thu, 23 Jan 2003 13:06:46 -0800
To: "Jean-Jacques Dubray" <jjd@eigner.com>, "'Miles Sabin'" <miles@milessabin.com>, <www-ws-arch@w3.org>
Message-ID: <IGEJLEPAJBPHKACOOKHNCEFKDBAA.arkin@intalio.com>
> -----Original Message-----
> From: Jean-Jacques Dubray [mailto:jjd@eigner.com]
> Sent: Thursday, January 23, 2003 8:10 AM
> To: 'Assaf Arkin'; 'Miles Sabin'; www-ws-arch@w3.org
> Subject: RE: Proposed text on reliability in the web services
> architecture
>
>
> >>That's a bad proposition. I would like to receive at some point (say 8
> >>hours
> >>later) a message confirming whether the delivery would be made or not.
> >>That's how I achieve reliablity of the application, and I cannot think
> of
> >>any other way.
> [JJ] So let's say that your message gets there and that you forgot to
> upgrade your system on your supplier's notice and the message format is
> now invalid. The application in charge of the business logic will never

Ideally if you detect that the message is invalid, or maybe you just
undeployed the application that processes it, you can always send a nack
(ack indicating delivery - no processing). But you cannot always determine
that when the ack is sent, so sometimes delivery is gauranteed and
processing is not. Which brings us back to the discussion of coordination.

> get the message and will never be able to send an ack PO. At this point

Just a reminder, the application does not send the ack, the RM does.

> it is hard to avoid having a confirmation that your message was valid
> and that it passed all the business rules of the application in charge
> of processing it. Unless, ah yes, you could retry n times, and if it
> failed after n times, you could conclude that the message had something
> wrong, oh, unless it is the partner system which is down. What you need
> is called guaranteed message delivery at the business application level
> as opposed to the transport level. Transactional web services (as
> opposed to data rich or computational web services) require a business
> protocol on top of SOAP, there is simply no way out of it.

You definitely want to be notified that the message can be processed if the
message is valid, and that it can be processed if the payment is acceptable,
and that it can be processed if the products are available, and that it can
be processed it the delivery time can be met, and that it can be processed
because the product is already shipping ...

Essentially what you want to do is progress from one state to another where
each state progression (aka round) proves the complete agreement about the
results of the previous progression (previous round). You can perform any
rounds you want until at some point you don't care anymore (you got the
products, that's what you ordered, time to make the payment).

That's how you build reliable system.


> >>- My proposal is only to allow this layer to exist through an abstract
> >>interface which allows the application to exert some control (e.g. try
> >>once/do your best, only deliver within 8 hours) and allows the layer
> to
> >>elect whichever strategy works best (depending on protocol) and allows
> two
> >>RMs to exchange acks to *improve* overall reliability
> [JJ] In ebXML this layer is called the BSI. There are clearly two levels
> of reliability:
> - the message got there
> - the receiver was able to process it
>
> Overall it is fairly sad that all these discussions are going on as if
> nobody had worked on it in the past 3-4 years. The exact same
> discussions they have been taken place more than 2 years ago and
> solutions -that could always be improved- have been provided (RN, UMM,
> ebXML). It is of course always easier to re-invent the wheel. In case
> you are not aware you can find ebXML the specifications at
> www.ebxml.org. You might want to take a look at ebXML BPSS
> specification...

There's a lot of research mostly in the past decade about reliable
communication within a group of n>=2 processes that can tolerate failure,
handle failure, achieve concensus, and meet basically every requirement you
can think of. The beautify of all these models is that aside from working
they have correctness: they are mathematically proven to do X and
mathematically proven to be improbable for Y. Which means you know exactly
what you're getting.

In one of the research documents I read the writer pointed out that many
systems are designed by making some assumptions about reliability, cutting a
few corners, and taking a simple easier approach. I definitely agree with
that point of view.

I am really hoping here and am fully aware that it's probably not going to
happen. But I would like to see the WS architecture built based on the
result of extensive research into the subject of reliable group commuication
and offering adequate solution.

I would point out that TRP cannot address reliable communication within a
group of n>2 parties since it does not support casual ordering of messages
and its total ordering cannot support more than 2 processes. Of course, any
group can be split into 2-party interactions, in which case ordering is no
longer provided by the RM, but by some layer that is fed with a choregraphy
definition.

I know not everyone here is interested in supporting group communication.
But I believe the majority of opinion (mine included) is that a choreography
language should be used to "describe" the choreography, not as a way to
enforce ordering of messages, and that services could interact properly even
if they happen to have lost the choreography definition (or never had one).
That doesn't necessarily complicated the RM that is used for 2-party
interactions, but it does suggest that exploring existing research in this
area would be helpful.

arkin

>
> JJ-
> >>
> >>arkin
> >>
> >>
> >>> -----Original Message-----
> >>> From: www-ws-arch-request@w3.org
> [mailto:www-ws-arch-request@w3.org]On
> >>> Behalf Of Miles Sabin
> >>> Sent: Wednesday, January 22, 2003 5:54 AM
> >>> To: www-ws-arch@w3.org
> >>> Subject: Re: Proposed text on reliability in the web services
> >>> architecture
> >>>
> >>>
> >>>
> >>> Assaf Arkin wrote,
> >>> > Miles Sabin wrote,
> >>> > > So there's a gap between the parties which are making the
> visible
> >>> > > commitments (the WS adapters) and the parties which are
> ultimately
> >>> > > responsible for meeting them (the endpoints). Whether that gap
> is
> >>> > > narrow and/or easily bridged, or an all consuming abyss is
> likely
> >>> > > to vary on a case-by-case basis. I'm sure many of us on this
> list
> >>> > > have experienced both.
> >>> >
> >>> > You have to decide what is the service and what is the
> application.
> >>> > If you have a message handler there that allows your application
> to
> >>> > receive messages over HTTP, the message handler is not the
> service.
> >>> > It's a proxy that takes care of the HTTP/SOAP/etc details on
> behalf
> >>> > of the actual service.
> >>>
> >>> That's the ideal, certainly.
> >>>
> >>> But the reality is that this is often very hard to do. In a not
> >>> completely implausible senario we might have, say, seven largely
> >>> independent organizations involved: the legacy system vendor, the
> two
> >>> sites which deploy that system, two consultancies providing the WS
> >>> gateways (one at each site), each using a WS toolkit from a
> different
> >>> WS tool vendor.
> >>>
> >>> In such circumstances clarity on the boundary between service and
> >>> application is going to take a lot of work. If differences of
> opinion
> >>> or outlook, or miscommunication, show through in the protocol or the
> >>> way the protocol is used, then RM is likely to be the least of
> anyone's
> >>> worries.
> >>>
> >>>
> >>>
> >>> Cheers,
> >>>
> >>>
> >>> Miles
>
Received on Thursday, 23 January 2003 16:08:42 UTC