This document has no official standing within the XMLP, WSD, or WSA working groups at W3C. It is not an official document of Tibco Software Inc., but is submitted by the Tibco Software, Inc., representative to these working groups. The document reflects the experiences and concerns of developers involved in enterprise messaging, and attempts to characterize such messaging as web services.
[Author's note: as a proposal without formal status, the URIs assigned throughout this document are under the administrative control of the author's employer, except where existing features, specified in other documents, are referenced (some external references are also in local administration). All such namespaces can be identified by the initial pattern: "http://www.tibco.com/xmlns/"... It is expected that these URIs will be updated to point into spaces under the administrative control of appropriate W3C committees.]
At least two previous proposals have been put forward for binding SOAP to internet email. This proposal differs primarily in focusing on list-oriented, publish/subscribe models. Insofar as the request-response exchange pattern is treated, it is significantly less prominent than in earlier proposals. That pattern requires only minor modifications for use within a binding which is, by default asynchronous.
The primary purpose of this binding is to illustrate SOAP bound to a different paradigm, specifically the publish/subscribe model, with asynchronous delivery. If current discussion of SOAP transport bindings may be said to be too focussed on the problems of the HTTP binding, to the detriment of SOAP, of HTTP, and of XML as a data exchange format, then this proposal seeks to widen the scope of application of SOAP solutions.
The current model of service description cannot handle publish/subscribe models, except at the cost of promoting clients into published services, a price too high for many participants to pay. Description can only be fleshed out with an adequate underlying set of abstractions.
This binding composes several proposed features and message exchange patterns into a single, complete description of the publish/subscribe model as implemented via internet email. The design of the proposal, by composition rather than monolithically, is intended to allow and even promote similar compositions for other bindings. Internet email was chosen for illustration not because it is the best example of the publish/subscribe model, but because it is by far the most familiar and accessible, which (it is hoped) will promote both understanding and implementation.
Supported Message Exchange Patterns
Message Exchange Operation (informal)
This SOAP binding specification adheres to the SOAP 1.2 binding framework, and as such uses abstract properties as a descriptive tool to define the functionality of certain features. This binding imports properties from separately-defined features (including message exchange pattern features), which it then specifies in greater detail, and on occasion further constrains.
The SOAP binding to internet email is not a "protocol" binding, as it effectively binds to SMTP (RFC 2821), the Message Format Standard (RFC 2822), MIME, IMAP, POP, and others. That is, it is expected that messages transmitted using this binding will travel over multiple protocols, and a significant portion of the binding is concerned with defining the interaction of data formats (Message Format Standard, Multipurpose Internet Message Extensions, encoding issues, and the like). However, the concept of "internet email" is relatively well understood as the transmission of messages conforming to the MFS, at least partially via SMTP, and characterized by the use of the mailto URI scheme.
The binding described here is identified with the URI
http://www.tibco.com/xmlns/soap/bindings/distEmail/
This will be referred to hereafter as email or email binding.
The binding described here is provided as an alternative to two currently existing proposals for binding SOAP over internet email.
Due to its design via composition, this binding makes use of a fairly large number of namespaces. In general, each namespace supplies properties which are used within the binding.
Prefix | Namespace |
---|---|
context | http://www.w3.org/2002/06/soap/bindingFramework/ExchangeContext/ |
http>//tibco.com/2002/soap/bindings/distEmail/ | |
mep | http://www.w3.org/2002/06/soap/mep/ |
fail | http://www.w3.org/2002/06/soap/mep/FailureReasons/ |
arr | http://www.tibco.com/xmlns/soap/mep/async-request-response/ |
confirm | http://www.tibco.com/xmlns/soap/mep/confirmation/ |
solicit | http://www.tibco.com/xmlns/soap/mep/solicit-response/ |
notify | http://www.tibco.com/xmlns/soap/mep/notification/ |
address | http://www.tibco.com/xmlns/soap/message-address/ |
corr | http://www.tibco.com/xmlns/soap/message-correlation/ |
faildest | http://www.tibco.com/xmlns/soap/failure-destination/ |
mimecont | http://www.tibco.com/xmlns/soap/mime-content/ |
mimecomp | http://www.tibco.com/xmlns/soap/mime-composite/ |
This specification makes use of the properties listed in table 2.
Name | Type | Constraints |
---|---|---|
context:State | enum | |
context:ExchangePatternName | anyURI | |
context:FailureReason | enum | defined in the fail namespace |
context:CurrentMessage | message | |
context:TriggerMessage | message | |
context:ImmediateSource | anyURI | |
context:ImmediateDestination | anyURI | |
arr:Role | enum | |
confirm:Role | enum | |
solicit:Role | enum | |
solicit:TermCondition | enum | |
solicit:Synchronous | boolean | |
solicit:NumRespondents | int | |
solicit:Deadline | date | |
notify:Role | enum | |
address:original-source | anyURI | |
address:final-destination | anyURI | |
address:response-address | anyURI | |
corr:message-id | string | |
corr:references | array of string | |
faildest:failure-destination | anyURI | |
mimecont:content-type | string | |
mimecont:transfer-encoding | string | |
mimecont:* | any | |
mimecomp:content-type | string | |
mimecomp:content-id | string | |
mimecomp:content-location | anyURI | |
mimecomp:current-part | message part |
[[Commentary: notice that MEPs tend to always define Role. It should probably be considered candidates for generalization, or for inclusion in context with per-MEP specialization. Note that although correlation and addressing properties are required by several different MEPs, they appear here (and are bound and constrained) only once.]]
This section discusses how each of the properties present in this binding is bound into the environment, and how each property is transmitted (if necessary) from node to node. In the case of the email binding, most properties are bound into headers compliant with RFC2821 (note encoding issues, however).
An instance of a transport binding to internet email and conforming to this specification MUST support the following transport message exchange patterns:
Several known features are directly supported by the email binding; three are required, two optional. Most properties defined by these features are bound to headers in the internet message format.
An instance of a transport binding to internet email and conforming to this specification MUST support the following features. Note that several of these features are indirectly required by message exchange patterns which require them.
An instance of a transport binding to internet email and conforming to this specification MAY or SHOULD support the following features:
The Transport Binding Framework, Message Exchange Pattern Specifications, and Feature Specifications each describe the properties they expect to be present in a message exchange context when control of that context is passed between a local SOAP Node and a local Binding instance, and vice versa. This specification adds constraints to some of the supported features and MEPs, but leaves some options available to the particular service (or, if the description language supports it, allows communication of certain properties on a per-exchange basis).
The email binding typically uses the confirmation exchange pattern for operations related to the administration of a mailing list. Services which are not distribution oriented are unlikely to need this operation type.
Confirmation-pattern operations may use addresses which are different from the addresses which they administer (indeed, this is the common case). A description or flow language will generally associate the administrative addresses and operations with the addresses and operations which are administered.
The email binding implements the solicit/response exchange pattern in its asynchronous mode only. The solicit:synchronous property may not be set to true. A form of operation similar to synchronous operation in most respects may be achieved by restricting the solicitation to a single subscriber, and setting solicit:TermCondition to First (or to Number, with solicit:NumResponses set to 1 or less).
It is anticipated that the more common use of the solicit/response pattern is to deliver the solicitation message to multiple subscribers, receiving zero or more responses. Subscribers (clients) may wish to vary their behavior based on the termination condition. A separate operation may be defined for solicit/response patterns terminated via notification.
The distribution list (subscriber list) for a solicitation message is not defined in the service description, but may be hard coded. More often, it is established and dynamically updated using an additional, or related operation (typically using the confirmation message exchange pattern). The subscriber list is always represented by a single address, which should be specified in the service description.
The email binding is highly suitable for the notification exchange pattern. It is anticipated that the common use of the pattern will deliver notifications to multiple subscribers, but it may also be used to deliver notifications to single recipients. Since the notification message is fire and forget, the service is agnostic.
The distribution list (subscriber list) for a solicitation message is not defined in the service description, but may be hard coded. More often, it is established and dynamically updated using an additional, or related operation (typically using the confirmation message exchange pattern). The subscriber list is always represented by a single address in the email binding, which should be specified in the service description.
The email binding also supplies support for the common request/response exchange pattern semantic. In request/response, it is expected that the requesting node will be a single node; the pattern does not involve multiple responses. The request/response pattern defined in part two of the XML Protocol specification is inadequate for support of the email binding; this binding requires at least the message correlation feature, and requires the ability to determine an appropriate return address.
The service description for an asynchronous request/response exchange pattern specifies the incoming mail address for the service, as well as the format of request and response messages.
The SOAP binding to internet email is complex (not only by virtue of binding loosely to multiple protocol and format specifications), and differs strongly from existing binding specifications in a number of ways. It is worthwhile, therefore, to highlight those elements which this binding brings to the fore, in order to suggest where issues that have arisen here might need consideration for other new protocol bindings, or even for existing protocol bindings.
Binding of address-related properties for email is problematic. The scope of the problem can be sensed by listing the headers related to target addressing, to source addressing, and to route recording.
Target addressing headers and fields
Source addressing headers and fields
Route recording and response headers and fields
From the above, it is clear that a sophisticated binding could make extensive use of various different headers, apart from the specification's invitation to applications to use application-specific extension headers (with the X- prefix) for application-specific usages. It is not entirely clear what the best binding of the various address related properties should be.
To add to the complexity, current bindings do not suggest that addressing is a generic kind of property, except via the ImmediateSource and ImmediateDestination properties, which seem deliberately designed to promote synchronous protocols over asynchronous. It is easy enough to know the ImmediateSource if it is a currently-open socket; if that isn't the case, then the message must be made to contain that information. This creates a greater burden for binding of other properties (such as OriginalSource and FinalDestination), because in a world without intermediaries, these properties directly correspond to ImmediateSource and ImmediateDestination. In order for the distinction to have meaning, the binding effectively requires an originator to know whether the message is going to pass through intermediaries or not, which is an unnecessary burden.
The email protocol specifications contain three headers which appear, on first glance, to be completely ideal for use in message correlation: Message-Id:, In-Reply-To:, and References:. Unfortunately, security considerations lead to a very strong recommendation that SOAP nodes not attempt to implement an SMTP server (see section 7 introduction, below). This makes the use of these header fields problematic, at best.
Specifically, clients are encouraged not to attempt to set a Message-Id header. Instead, the algorithm (which has good characteristics for creating unique messages, if properly implemented) to generate a message id is supposed to be applied by the first server to receive a message that does not have a message id already set. The consequence of this is that a client, as a rule, does not know the message id of any messages sent (unless the client sends itself a copy), and therefore cannot use this useful and unique identifier for correlation. The server does not supply this information to the client (in part because the client supplying the message could, in theory, be a server (a badly written server, mind)).
This means that some other means of identifying messages becomes necessary, and that there is a greater chance that the chosen identifiers will collide. Therefore, this binding recommends a concatenation of informational fields that should lead to relatively unique IDs, and also permits the application to define other sorts of identifications.
Among the problems illuminated by the internet email binding, the issues of addressing and correlation are probably the most significant. Addressing issues have already been alluded to, and are further developed in the discussion of fault routing, below. The question of correlation very strongly arises in the context of multiple recipients.
Internet email is commonly delivered to aliases, which may be individuals or groups of individuals. Mailing list software may establish a mapping from a list address to a set of mailboxes, or the mail system may do so itself (in which case it may be accessible via the SMTP EXPN command). When such messages are delivered, they should be identifiable, and when messages related to them are generated, there should be some way to establish this.
The obvious means of establishing such identity and correlation is to use the headers built into the internet message format for that purpose: Message-Id, In-Reply-To, and References. Unfortunately, as noted in the discussion of the corr:message-id property binding, the requirements of identification for SOAP are not the same as the requirements for SMTP. Put simply, SMTP identifies messages so that servers can avoid mail loops. Internet message format defines additional headers (In-Reply-To and References) that build on this very restricted message identification functionality to provide message associations. Unless a developer is willing to implement in a client functionality that is normally delegated to a server, however, internet message format identifiers are poorly suited to support the requirements of SOAP correlation.
SOAP correlation requires that the sender, as well as the receiver, be able to identify a message, and that both endpoints be able to identify messages in the same way. Because the Message-Id header is expected to be constructed by an SMTP server, after it has passed out of the view of the client, it does not meet this requirement. The sending client is not aware of the id attached to the sent message. Since In-Reply-To and References build on Message-Id, they share this deficiency. One solution is to always copy the sender, on any message, but this is less than ideal (loss of a message is a particular problem, and email is not known for its delivery guarantees).
As a result, this specification suggests using other identifiers for identification, and building on these alternate identifiers for correlation. Applications should examine the issue before specifying a binding.
The fact of asynchronicity in the email binding presents some interesting issues, some of which simply expose possible weaknesses in current definitions. For instance, asynchronous delivery to potentially many (but possibly zero) subscribers leads to a much more complex interpretation of the completion state, which is elsewhere described simply as "success" or "failure." For exchanges with multiple recipients, and especially exchanges with multiple respondents, "completion" may be successful in some sense, unsuccessful in some sense, and both successful and unsuccessful in some sense.
A further issue in this regard is that email is a store-and-forward technology. This means, in effect, that delivery is not equal to receipt, although this is usually only relevant when the delivery status notification extension is available (when the sender gets notification of a delivery). This provides a stong distinction to synchronous protocols, in which one can generate an error if receipt is not equivalent to delivery.
A number of features have been proposed to support asynchronous communication and the complexities that arise therefrom. Some of the problems lead to a different description of the state machine (a terminal state called "completion" rather than two terminal states, "success" and "failure").
The issue of encoding presents severe compatibility problems when services are defined to use the internet email binding. The basic problem is that XML, the data transfer exchange format, is defined in terms of unicode. Internet email is defined in terms of the Network Virtual Terminal, which requires no more than 7bit ASCII (and even then does not guarantee transmission of control characters).
It is possible, in a specification, to require that services use extensions that permit cleaner support of 8bit character sets. Unfortunately, it is not possible, in the current state of the TCP/IP network, to actually implement this restriction unless all of the SMTP servers are known and controlled (this would involve requiring the 8BITMIME extension, which remains incompletely supported). It is far easier to require MIME support for simple types (which this specification does), for instance.
Further problems arise in the presence of the xml declaration, which may supply an encoding (which corresponds to a charset in MIME). The interaction of potentially unsynchronized properties must be a concern of service and client developers. The issue of character sets is further complicated because of the possible transfer encodings (7bit, 8bit, binary, quoted-printable, base64), which interact with the character set definition--in a rather unpleasant fashion--if the character set is anything other than ASCII. HTTP, developed ten years and more after SMTP and able to rely upon the advances of those years, specifies an eight bit clean message path. SMTP and supporting protocols emphatically do not, which means that the problem surfaces at the application level.
This specification attempts to provide the tools to solve the problem, by requiring the MIME content feature and recommending the MIME composite feature. Applications should give careful consideration to the potential problems. The binding specification cannot provide full resolution of the consequences of this issue.
The problem of addressing in email has already been discussed. The issue of return paths is a part of that problem. However, the issue of alternate routing of error messages is also worth raising. It is very common, in mailing lists, that errors are reported to an address other than the distribution (and often other than the administration) address.
This sort of requirement, together with problems of reporting errors at all in one-way and notification message exchange patterns, strongly motivates the development of a separate feature to identify the preferred destination of error message (at the SOAP level in this case). The failure destination feature is therefore recommended.
Services making use of the email binding may wish to consider additional issues of routing, and possibly adopt features that support extended routing properties.
Principle: do not create an SMTP server as part of a SOAP application, unless you fully understand how to protect it from abuse.
The more general principle: do not create a messaging server or router as part of a SOAP application. It is perfectly possible to create a SOAP service which interacts with messaging routers as a client, rather than replicating the functionality of a router. This principle needs to be stated up front, as creation of a server for HTTP is a fairly common design for SOAP applications. The fundamental difference is that an HTTP server only responds to the initiator. SMTP, and messaging routers in general, provide little response to the initiator, but then produce potentially significant amounts of material to send to target addresses. Failure to take into account any of hundreds of nuances in server design can easily lead to mail loops, painfully prolonged and repetitive attempts to deliver the undeliverable, and network floods. It is therefore strongly recommended that SOAP applications binding to internet email never expose a public SMTP server to the internet. The logs of the sendmail, postfix, and qmail MTA mailing lists should provide adequate support for this position.
If this principle is followed, then the primary security considerations for a SOAP application bound to internet email must take cognizance of issues in SMTP, POP, IMAP, etc., but need not revisit the entire corpus of security alerts related to internet email. Instead, the focus in the points that follow is on security issues that are of particular applicability to SOAP applications. Developers who wish to implement general SMTP servers as part of a SOAP application are directed to the raft of security alerts and issues associated with SMTP as well.
SMTP does not provide an authentication model. As a rule, SMTP servers will always accept mail for a "local" address. Some may have authentication routines to permit local addresses to send mail out, but this is by no means universal. As a rule, SMTP servers are perfectly willing to accept forged "From" headers (which are typically associated with the origination address).
Virtually every service delivered via internet email must address this issue to some degree. The service may resolve the issue as simply as stating that it is an application responsibility, or it may require certain SMTP features to be implemented by all servers used in mail transmission (an unlikely requirement, unless the system is controlled end to end). Origination addresses in the SMTP/Internet Message Format headers cannot be relied upon. Envelope addresses in SMTP cannot be relied upon. Email forgery is trivial, and the protocols do not supply broadly deployed solutions. The service or application must address the issue, if the questions of authentication and repudiation are significant in service context.
Internet email traffic is vulnerable to traffic analysis. There are a few methods for disguising the fact that two nodes are communicating, but they are not very effective. Services should not be created which rely upon resistance to traffic analysis, because internet email hasn't any. Services may be built which can compensate for this, but such enhancements are well out of scope for discussion here.
Mail retrieval agents (POP, IMAP, mbox, Maildir, mh, and proprietary clients) have a long and inglorious history of abysmal security. Most authentication protocols are clear-text (username and password passed in clear text over the network). Many existing servers (POP, IMAP, or machine login) do not implement strong authentication protocols, or when they attempt to do so, fail.
Confidentiality may then be breached at the point of retrieval. Moreover, with some protocols, bogus messages may be inserted at this point. As with message origination address forgery, the greater part of the responsibility for verification of received/retrieved messages is left to the SOAP application.
Depending upon implementations, messages transiting a service implemented over internet email may be vulnerable to snooping, tampering, destruction, replacement, or injection. This may happen, as previously noted, at the origination point and at the point at which the mail retrieval agent acquires the message from its delivery point, but as SMTP is a store and forward technology, it may also occur at multiple additional points along the route. This is not, from the attacker's point of view, an ideal opportunity, but if no others exist, it may nonetheless be attractive.
At any point at which a message is stored along the route from sender (or malicious injector) to receiver, it may be vulnerable to modification, and is almost certainly vulnerable to snooping. Certain forms of snooping (target address, for instance) are nearly impossible to defend against (the address has to be available in order for delivery to take place). For the remaining issues, the internet email system provides very little help to the application. It is up to the service to utilize external features (such as SAML, WS-Security, and the like) in order to achieve the level of security, authentication, and non-repudiation required by the service.
Unsolicited Commercial Email (more generally, Unsolicited Bulk Email (UBE), or colloquially "spam") is a relatively significant problem in the current internet email environment. SOAP services that implement the publish/subscribe model thereby create lists of valid email addresses (albeit addresses that may not be monitored by live humans). Exposure of such a list is likely to result in a flood of spam to the subscribers.
This presents a number of security issues for consideration. First, a SOAP service or subscribed client MUST be able to handle (by discard, if nothing else) messages which do not conform to the SOAP specification. Second, such services and clients MUST be able to extract administrative control messages, and route them to an appropriate authority for resolution (recognizing root, MAILER-DAEMON, and Postmaster addresses is a minimum requirement). A service SHOULD NOT remail blindly (see the first principle of binding to internet email, above). Finally, services and subscribed clients SHOULD have recovery plans to deal with what amount to denial-of-service attacks performed by clueless bozos determined to MAKE MONEY FAST.
Apart from the greed that drives marketers to use customer's money, time, and equipment to reduce their own costs, absolute malice seems to drive other abusers. This typically takes the form of worms, trojans, and viruses; email is possibly the most common distribution mechanism. While most SOAP clients are unlikely to be the security problems that some user-facing clients are, SOAP clients (and servers) are expected to execute programs based on received content. Developers should be careful that this does not create paths for delivery of any form of malware, either to themselves or to correspondents.
Put a discursive description here. Service sends out RFPs. Has associated administrative service to acquire and remove subscribers. Each RFP is solicit-response; termination of bidding is indicated by a notification. Fulfillment is out of scope (it's gonna be added later, RSN).