Feature: Message Correlation

Status

This proposal is intended to meet the requirements for a SOAP feature, as described in section 3.1 and 3.1.1 of the SOAP 1.2 Specification, with the exception that the description is not necessarily limited to "between two adjacent SOAP nodes," merely "between adjacent SOAP nodes." The proposal includes a recommendation that the feature be regarded as standard, that is, that it be a required feature for every binding. A discussion of how this may be implemented in a fashion compatible with existing implementations (which rely upon implicit correlation resulting from a synchronous messaging model) is included. The document has no formal status whatsoever, within W3C and its working groups, or within TIBCO.

Motivation

Current specifications for SOAP have been developed in an environment in which the primary transport is HTTP. HTTP is a strongly synchronous protocol, which has led to the almost exclusive use of the request/response message exchange pattern, with implicit synchronous semantics, including a point-to-point model, and an expectation that the node required to respond already has an open connection to the requesting node.

The publish/subscribe model of service provision, widely used in enterprise architectures, violates most of these implicit assumptions. Messages in such a service may not be part of a request/response pattern, and may be received, or require a response, asynchronously. There is no guarantee that the node will have external, protocol-provided metadata adequate to generate the addressing information for despatch of a response or of a fault.

Other specifications have already addressed this issue, including WS-Routing and WSCI. Those efforts have been reworked here in an attempt to isolate the message correlation feature, thereby reducing its implementation cost. A discussion of integration with existing processor semantics is included to encourage the consideration of this proposal as a standard feature, by defining earlier (synchronous and implicit) behaviors as the default (but no longer the only possible behavior).

Contents

Status

Motivation

Definitions

Description

Interactions with Other Features

Implementation as a Module

Security Considerations

Additional Notes

References

Definitions

The keywords "MUST", "MUST NOT", "REQUIRED", "SHALL", "SHALL NOT", "SHOULD", "SHOULD NOT", "RECOMMENDED", "MAY", and "OPTIONAL" in this document are to be interpreted as described in RFC2119.

Table 1: Message Correlation namespace and prefix
PrefixNamespace
corrhttp://www.tibco.com/xmlns/soap/message-correlation/

The URL for this feature (not required, as it is with MEPs, but certainly useful) is http://www.tibco.com/xmlns/soap/message-correlation/. Properties are defined within this namespace. The "common" namespace prefix (used throughout this document) is corr. This feature establishes one required, and one optional property which processors must handle.

Table 2: Properties for the Message Correlation feature
Name Type Constraints
corr:message-idxs:string Required; unique within a conversation
corr:referencesarray of xs:string
corr:message-id
The message correlation feature requires, of any MEP, binding, or application that implements it, that a message-id property be established. The minimum requirements for this property are that it be a string containing no whitespace, and that it be unique within the conversation. The term conversation is left ambiguous by this specification; a MEPm binding, or application making use of the message correlation feature SHOULD refine its meaning more precisely for the relevant context. A MEP, binding, or application MAY have more stringent requirements, but it MUST NOT loosen these requirements. A MEP, binding, or application may supply several candidate message ids, including composites (for instance, a module implementing the feature may place the message id in a single header field, whereas an email binding might offer the option of using the Message-Id header or a concatenation of Sender + Subject + Sent; a service description offered several possible candidates should specify which it is using). The message-id property is established at the time of transmission from the original sender, and may not be changed. If a candidate message-id contains whitespace, an escaping mechanism must be defined, in the same document that describes the message-id mapping. The message-id property is carried with the message (implicitly or explicitly). Synchronous protocols may establish a message-id property based on a currently-opened connection.
corr:references
The message correlation feature permits any MEP, binding, or application to also define a property for references. References are defined as an array of message-ids, in most-recent-first order. Each member therefore must correspond to a message-id. More than one message-id in the array indicates a conversational thread. Synchronous protocols may establish an implicit reference corresponding to the implicit message-id. MEPs, bindings, and applications may define multiple means of identifying the references, and again the service description, if given multiple possibilities, must select one. References are carried (implicitly or explicitly) with the message, are created by the message sender (implicitly or explicitly), and may not be changed in transmission.

Description

The message correlation feature provides a means for identifying messages and for placing them within the context of a conversation. A conversation is defined as any collection of related messages, which may be as small as a single message exchange pattern, or extended over tens or hundreds of such exchanges (though the latter is less likely; conversations do pause and are restarted).

The message correlation assigns a message id, unique within the conversation, to each message. Later messages in the conversation may refer to it (and responses generally must do so) by placing it into the references property. The references property contains a list, most-recent-first, of messages which this message is in reply to (or more generally, is related to).

In order to support the semantics of the existing request/response message exchange pattern, a standard interpretation of the absence of these properties (that is, null values for message-id and for references) is that the message is part of a synchronous exchange. In general, it is more interoperable to specify message ids and references, but in their absence, a synchronous, request/response pattern may be assumed, in which a response is expected over a currently open socket, which is presumably already available to the processor in some fashion. A response containing null references is presumed to have a single entry in its references list, referring to the synchronous request that called it forth.

[[An alternate way to do this: describe a property with two value domains, one the above-mentioned xs:string, the other a black box connected with a socket. Messages with no message-id or references are evaluated to implicitly decorate them with the black box. This is all a horrible description, and badly needs cleanup, but the principle should be clear: anything that doesn't specify the feature's semantics is treated as part of a synchronous request/response exchange with no correlation semantics.]]

The value space of the message ids may be specified by the binding, by a module implementing this feature, or by applications in a manner outside the scope of this specification. For instance, some protocols already have a concept of message id, so they may simply require that the feature be bound to this existing functionality. Other protocols, or message exchanges across multiple protocol domains, may have additional requirements or restrictions. It is not useful to describe the message id in any more detail; the binding, module, or application defining the implementation should also enforce the semantics, that the message id must be unique within a conversation. Note that this requirement (unique within a conversation) is significantly less than the requirements likely to be placed on a message id if it uses existing functionality defined by a protocol or application (these tend to require uniqueness within a larger scope).

corr:message-id must be bound at message creation, must not be changed during transmission, and must be available to all participants. corr:references, when implemented, is set at message creation, and must not be changed during transmission. If a triggering message contains no corr:references property, then the property in the response message should contain a single value, the content of the corr:message-id property. If a triggering message does contain a corr:references property, then it is copied to the response message, and the value of the triggering message's corr:message-id property is prepended.

Interactions with Other Features

The message correlation feature does not rely upon any other feature. A number of other proposed features rely upon message correlation, however. Message correlation can thus be regarded as a "fundamental" feature, in that it lays the foundations for a number of other useful technologies, and is itself structurally important for a number of use cases considered by the authors to be significant (especially the publish/subscribe model of service provision).

It is recommended that the XML Protocol working group give serious consideration to making message correlation a standard feature, with appropriate default semantics to preserve the implicit message identification and correlation currently implemented for request/response over synchronous protocols.

Implementation as a Module

It is strongly recommended that the correlation feature be required for all applications/bindings. It may be implemented as a module, using the pattern described below. It is recommended that when implemented as a module, the mustUnderstand attribute be set true.

  <soap-env:Header>
    <corr:message-correlation>
      <corr:message-id>xs:string</corr:message-id>
      <corr:references>?
        <corr:message-id>xs:string</corr:message-id>+
      </corr:references>
    </corr:message-correlation>
  </soap-env:Header>

Security Considerations

Without authentication of source identity, message correlation can provide a means of spoofing or disrupting services, typically via forgery of responses, but also via forgery of ids (in order to disrupt the identification function of the property). These issues are generally not visible when message identification and references are bound implicitly to open sockets. Bindings, MEPs, and modules making use of the feature may wish to specify protections against message-id forging and reference forging.

Additional notes

Widespread implementation of message correlation carries with it some interesting extensions to the possible practice of XML message exchange via SOAP. These are interesting possibilities, but do not represent the motivation for the definition of the feature, and are not central to its functionality. The most obvious use case here is "protocol-hopping," in which a message exchange pattern is bound to multiple protocols (HTTP request and email response, for example). This use case immediately suggests transparent protocol hopping via the use of intermediaries. Other possibilities include archiving with reference via id, a REST-style implementation of messaging semantics heavily reliant upon identification, and long-duration asynchronous activities (where long duration is measured in years, perhaps).

These possibilities are mentioned here in order to attempt to exclude them from the feature definition. The core of the feature is simple: identification and correlation. A great deal can be built on top of it, but these additional functionalities should not be mistakenly placed into the core definition, increasing implementation complexity.

References


Amelia A Lewis
Last modified: Fri Oct 11 12:51:02 EDT 2002