Correlation Issue: Pre-Proposal from Gary Brown on 2004-11-11 (public-ws-chor@w3.org from November 2004)

From: Gary Brown <gary@enigmatec.net>
Date: Thu, 11 Nov 2004 10:04:13 -0000
To: <public-ws-chor@w3.org>
Message-ID: <008301c4c7d5$d4bdb360$0200a8c0@LATTITUDEGary>

NOTE: This will probably not be a full proposal before the end of the F2F, but we (Enigmatec) consider it a high priority matter that must be resolved during last call. At the F2F we would seek to discuss the issue and proposed solution. When reading this proposal, it is important to consider the concept from the perspective of a third-party monitoring tool that has no control or visibility of the web service implementation that is executing the choreography, and therefore all relevant information must be inferred from the messages that this tool would observe.

Issue:

Currently it is only possible to correlate a set of observed messages to a particular channel instance. This is based on either using the 'identity' token to identify a channel instance 'key' from each message (which then becomes the logical conversation identity), or if the 'identity' token is not defined, then it is assumed that the messages will be correlated to a channel instance based on private information (e.g. an id from a message protocol header).

However, if the choreography has more than one channel instance, then we have the problem that although we can correlated messages to their respective channel instances, there is nothing to indicate how the channel instances are bound to a session (choreography instance). Therefore, from a third party monitoring perspective, if a particular endpoint (web service) is executing many instances of a particular choreography, and that choreography has two channel instances (for example), then it would not be possible for the monitoring tool to determine which pair of channel instances are associated which each other in a particular choreography instance.

There is currently text in the CDL specification that restricts the identity tokens to being the same types, if two or more channels are referring to the same participant.

"If two or more Channel Types SHOULD point to Role Types that MUST be implemented by the same entity or organization, then the specified Role Types MUST belong to the same Participant Type. In addition the identity elements within the Channel Types MUST have the same number of Tokens with the same informationTypes specified in the same order"

Although this would enable the correlation of channels within that participant (i.e. the server side of the conversations), based upon the identity generated by those tokens, it would not help the 'client' side that has to manage the identity associated with different channel instances referring to different participants.

Discussion:

Therefore, at present we can consider we have the ability to correlate messages to a channel instance, based on either explicit information in each of the messages (extracted based on the token locator mechanism), or implicitly provided based on private protocol information.

Next step is to think about how this channel information may be associated with the concept of a session (or choreography instance). In some cases, each channel instance will have conversations that are based around a common identity, e.g. an order id. However, in some situations, a common key may not span all of the channel instances.

QUESTION: We need to decide whether it is reasonable to restrict the mechanism so that a common key is shared across all channel instances used within a particular choreography instance?

If it is reasonable, then maybe all that is required is for each channel type in the choreography description to identify a particular token (or tokens) that provide the session identity, and must be provided within all channel types.

However, if this restriction is not considered reasonable, we need to start thinking about a more complex mechanism that will enable different channel types to be bound to a session identity. The remainder of this document will focus on this concept, as the simplier approach above does not require as much thought.

Identitying a Session:

First point to make is that establishing the linkage between a channel instance and the session is something that only needs to occur when the channel is created, which from an observable perspective is when the first message is sent. Once this link has been established, there is no further requirement to extract any session level identity from subsequent messages on the channel instance.

This means that we would be expecting the first message on a channel instance to contain the relevant session identity field. However, this field does not necessarily have to be one of the fields that forms the identity tokens for the channel instance. This removes the constraint that each channel instance must share a common key field(s).

More notably, this concept means that it is possible to link two or more channel instances, to a single choreography session instance, where those channel instances are actually identified by private (i.e. protocol specific) information.

Thus, the proposal would be to have an extension to the ChannelType definition to include an optional session identity element, in addition to the current channel instance 'identity' element. If this field is not specified, then it implies that the session identity will be provided implicitly based on private information (i.e. similar semantics to the channel instance identity).

Handling Multiple Session Identity Keys:

In the previous section, we introduced the idea of representing the session identity within a channel type. This could be defined using the same definition of one or more tokens that is used in the <identity> element. If multiple tokens are defined, then the combination of fields will uniquely provide the session identity.

However, it may be that a session (choreography instance) can only be identified using more than one unique key - and that different channel types need to use different keys to associate with the same session.

In this situation, we could define multiple 'identity sets' for a session key - however validation would be required to ensure that the particular session key used by a channel would have been resolved as part of a previous channel being created. For example, when the first channel is established, it may result in keys K1 and K2 being initialised with the relevant token values. When the second channel is established, its first message contains the relevant information to map to K2, linking it with the session, and similarly a third channel may have a first message that has relevant information to map to K1.

We could extend this example (and concept) by imagining that another key has also been defined K3, associated with channel type used for the third channel. Following the successful correlation of that third channel to the session, based on K1, it is now possible to associate the value extracted for this new key K3 with the same session - which means that it is now possible for a fourth channel to be associated with the session based on K3.

This is a complex key resolution approach, but would avoid placing restrictions on the nature of information passed on different (distinct) channels - example 2 will hopefully make this approach clear.

Example 1:

This example shows the situation where a 'parent' choreography results in multiple 'child' choreography instances being performed against different participants.

In this situation, the channel type CT1 would be used by the parent choreography to establish a correlation between an instance of channel type CT1 and the choreography instance using the OrderId field in the first message that is sent on the channel instance. Therefore at this point, the session would be identified by a unique order id.

When a sub-choreography is performed to represent the interaction with a particular participant using channel type CT2, then the first message on this channel will identify the sub-choreography's identity as being unique based on the order id AND vendor name - however it will also serve the purpose of binding the sub-choreography session to its parent choreography (with the same OrderId), on the basis that a subset of the sub-choreography's identity is the identity of the parent choreography.

We could also add a rule that states that the session identity of a sub-choreography is only relevant while that sub-choreography scope is active (i.e. until it is finalized).

Example 2:

This example shows a situation where a second channel type does not share a common identity field with the first channel type. In this situation, the conversation over a channel instance of type CT3 would be expected to derive the identity field token that is associated with the second channel type CT4, prior to that second channel instance being used.

In this situation, the first message on an instance of CT3 would identify the unique OrderId associated with the session. A subsequent message on this channel instance would then result in the 'authorization id' also being associated with the session. This means that there are now two pieces of independent information that can be used to identify this one choreography instance, which also means that any other channel type used by the choreography instance from that point onwards can either be bound to the session by the 'OrderId' or 'AuthorizationId' value (as in the case of channel type CT4).

Received on Thursday, 11 November 2004 10:04:38 UTC