Message Reliability Text

Here is some candidate text for section 3.12, "Message Reliability".  I
understand that this will require extensive reworking to integrate it
with the existing document, both in terms of objective content and style
-- but maybe it's a start.

   *********

Goal AC017 of the requirements says that the architecture must satisfy
the requirements of enterprises wishing to transition from traditional
EDI and more specifically requires that the architecture must support
reliable messaging.  Subsequent discussions have revealed that there is
a hierarchy of concepts involved with reliability and considerable
confusion can result from mixing considerations appropriate to the
various levels.

The highest level on which one can view reliability is at the system
level.  That is, the reliability of the Web service as a whole.  This
general view of reliability focusses on the predictability of the
quality of service.  In other words, does the system predictably do what
it is supposed to do?  This consideration is generally separate form
concerns of falut tolerance, availability or security, but there may of
course be overlapping issues.

There are a number of simple things that can be observed about system
reliability.  First, reliability may have various aspects and
considerations that depend on context.  For example, in some contexts a
car may be considered reliable on the basis of metrics such as the
requirement that turning the key will, with high probability of success,
start the engine in a few seconds.  But if the context is an icey
mountain road the metrics for reliability may be of an entirely
different nature.  In all cases, however, understanding reliability
depends crucially on defining metrics for the quality of service.

Reliability may not be related simply to the end result of the operation
but also to factors like visibility.  For example, two applications may
have the same probability of producing a particular successful result,
but if one of them displays the inner workings of the process in such a
way that makes it possible to anticipate and avoid developing problems
or to analyze and correct errors after the fact, that application may
reasonably be considered more reliable than one which acts as a black
box.

In practice, many of the aspects of system reliability are handled under
the umbrella of the management function.  However, there is a subset of
system reliability that relates to messaging that is commonly considered
separately and referred to as reliable messaging.  In general, this
refers to a predictable quality of service related to the delivery of
the messages involved with the Web service.  In more detail, the sender
of the message would like to be able to determine whether a given
message has been received by its intended receiver so that it is
possible to take compensating action in the event the message has not
been received.  On the other hand, the intended receiver of the message
may wish to be assured that it has received and processed a message once
and only once.  The general goal of reliable messaging is to define
mechanisms that make it possible to achieve these objectives with a high
probability of success in the face of inevitable but unpredictable
network, system and software failures.

The goals of reliable messaging can be made more explicit by considering
the issues related to multiple receptions of a message and message
intermediaries.  If there is an intermediary, does the sender want to
know whether the message got to the intermediary or to the intended end
recipient?  Does the receiver care whether it receives a message more
than once?  The following classification of reliable messaging
expectations is taken from the ebXML Messaging Services Specification,
Version 2.0 (http://www.ebxml.org/specs/ebMS2.pdf).

 <<Picture (Metafile)>> 
The goals of reliable messaging may also be examined with respect to
whether one wishes to confirm only the receipt of a message, or perhaps
also to confirm the validity of that message.  Three questions may be
asked about message validity:

1 - Was the message received the same as the one sent? This may be
determined by such techniques as byte counts, check sums, digital
signatures.
2 - Does the message conform to the formats specified by the agreed upon
protocol for the message?  Typically determined by automatic systems
using syntax constraints (eg xml well formed) and structural constraints
(validate against one or more xml schemas or WSDL message definitions). 
3 - Does the message conform to the business rules expected by the
receiver? For this purpose additional constraints and validity checks
related to the business process are typically checked by application
logic and/or human process managers.

One may ask which of these message validity issues belong in the
technical infrastructure represented by the Web services architecture.
The usual answer is that the first is definitely in-scope, the last is
definitely not, and the second may or may not be depending on the system
being considered.  [Note: it seems to me that the WG may wish to decide
that 2 is definitely in or definitely out of scope].

The final layer of this conceptual hierarchy is the actual mechanism
used to implement the reliable messaging goals.  This is most often
achieved via an acknoweldgement infrastructure, which is a set of rules
defining how the parties to a message should communicate with each other
concerning the receipt of that message and its validity.  WS-Reliability
(www.oasis-open.org/committees/download.php/1461/WS-ReliabilityV1.0Publi
c.zip) is an example of a specification for an acknowledgement
infrastructure, as is WS-ReliableMessaging
(http://msdn.microsoft.com/library/default.asp?url=/library/en-us/dnglob
spec/html/ws-reliablemessaging.asp).   

Received on Friday, 25 April 2003 12:22:16 UTC