Steve Ross-Talbot, Enigmatec Corporation
Ltd.
Date: 3rd March 2004
In any interaction between processes (and by processes I mean a
more generic definition than just web services but given it is a superset it
applies equally to web services) if the observable scope of a process is at the
point at which interaction is observed then we say that we can observe the
external behaviour of that process. For example if a process A sends an order
to to another process, B, and that process enacts some logic to determine what
to do next; it could determine it was for a premium customer - in which case it
send the order to a process called C or it may determine that is for an
ordinary customer - in which case it send the order to a process called D then
all we can observe is the observable interaction of passing the order from one
process to another. Thus valid sequences in the overal grammar that represents
the choreography are (I'm going to use a pseudo pi-calculus to avoid
ambiguity):
a.A.x'
| x.B.y' | y.C
a.A.x'
| x.B.z' | z.D
Where "|" represent the combination in parallel of the
process that are interacting.
Where orderN represents a receipt of a message and x' the sending
of a message. In this case the x and x and y and y' represent
types channels over which directional communication takes place.
Where "." represent sequence, that is and order is
received on a and then A does it's stuff and then sends an order on x' which B
receives on x and then sends the order on y' which C receives on y (or of
course sends on z' and D receives on z.
We can label these as:
PremiumCustomer
::= a.A.x' | x.B.y' | y.C
OrdinaryCustomer
::= a.A.x' | x.B.z' | z.D
Now let's move onto error handling. There are two levels that we
need to consider. The first is dealing with exceptional circumstances arising
from a failure in A, B, C or D and the second arises from out of bound message
exchanges; these are messages for which there is no definition in the
choreography description that is able to handle them in the current context in
which a context is a collaboration group.
As far as A, B, C and D are concerned should an error occur at A, B,
C or D at any time it may result in either a different message being send (on
another channel created for the purpose) or no message being sent. So on the
one hand we have the presence of a message and on the other we have the absence of a message.
The presence of a message can be modeled analogously. The absence
of a message could be viewed as a message from a timeout, for example we could
change the definition of a PremiumCustomer as follows:
PremiumCustomer
::= (a.A.x' + t.A) | (x.B.y' +
t.B.t') | y.C
The "+" operator is a choice.
In this example if B fails then it receives a timeout on t and
then sends a timeout on t'' which A receives. C is unaware and waits for an
order on order3; which is okay because it hasn't progressed at all at this stage.
The net result is an alternative path is taken in order to deal with the
observable message that resulted from the timeout.
The other way of dealing with this is if process B, at least in
this case, sends a different message (not a timeout message) which requires C
and possibly A to take action of some sort. The only thing that needs to be
modeled is the observable interaction that ensues from the point of view of the
receivers A and C.
The OrdinaryCustomer is analogous.
Handling out of bounds messages in a choreography is really no
more that a parsing error across the allowable messages at any point. If such
an out of bounds message occurs then the choreography need do nothing about it.
Why is this? Simply because the runtime environment (which is vendor specific)
would be required to do something about it. This is dependent on the role that
the runtime environment plays and so is ultimately down to a vendor to specify.
If there is an issue then it is how to propagate such an error to more than one
participant process. The technical solutions for modeling this are much less of
a problem than the political issue of a participant agreeing to allow observers
(in this case the other participant processes) to observe that it has generated
an out of bounds message. So the solution lies in providing the choreography
designer with the necessary tools to be able to model it in a global and a
local way. In the former the other participants may need to be informed in
which case this could be viewed as an alternative path and in the latter a
local matter to be resolved by the owner of the process.
From a technical
point of view modeling time in a consistent manner is easy to say and much
harder to do. However the process that the time relates to are generally in one
location and so their notion of a timeout, so when to start the clock, is from
their perspective.
The other issue
is how to model the sends/receives as interacts. Since an interact collapses
the notion of send/receive into one conceptual entity we need to be able to do
the same thing but with an interact. This is really a question for the spec
editors.
Propagation we
shall leave for group discussion as part and parcel of the wider debate.
From a semantic perspective there is a need to differentiate between normal behaviour and behaviour that deviates from the norm. So a timeout might be seen (and this could be dependent on context for a choreography and equally may imply a different choreography) as a marked (i.e. well named) channel or marked message, which is sent on the marked (i.e. well named) channel for the purpose of clarity where clarity is based on the context in which a choreography is created (i.e. fixprotocol or SWIFT etc). It is also the case that timeouts need to be distinguishable from each other Ð less in terms of duration and more in terms of the impact they have; which path they choose for example.
The same can be said of errors.