RE: REST, Conversations and Reliability from David Orchard on 2002-08-06 (www-ws-arch@w3.org from August 2002)

From: David Orchard <dorchard@bea.com>
Date: Tue, 6 Aug 2002 11:09:56 -0700
To: "'Paul Prescod'" <paul@prescod.net>, <www-ws-arch@w3.org>
Message-ID: <009401c23d86$13508030$140ba8c0@beasys.com>
I finally got around to this important proposal, apologies for the delay.

In general, now we're cookin' with gas.  We have a REST proposal on how to
do reliable messaging and conversation.  Cool, it's always better to debate
actual proposals rather than theory.

First, let's dispatch the low-hanging fruit, though these aren't related to
REST.  1) Receiver generated conversation IDs are useful in addition to
sender-side.  HTTP cookies do receiver, our first cut does sender-side.
With clustered servers on both sides, you probably want dual cids.  It gets
a bit tricky with more than 2 nodes, but let's leave that for now.  You
still need client side ids for asynch requests.  2) Explicit deletes might
be a good thing as well.  We'd hoped to avoid this, and we certainly are
avoiding distributed garbage collection/cache expiry.  Though there probably
should be a TTL as well.

On another easy one that's web architecture related, the conversation ID
should be a URI, and probably an http: uri as per the work in the TAG.
There is always the danger - just like namespaces names - that people will
try to do something like a GET on an HTTP URI (as opposed to an HTTP URL),
because you can't tell from an HTTP URI whether it's an identifier or a
locator.  I think that's an issue that the TAG should be thinking about, but
there's enough discussion going on about the range of http: currently :-)

Now the tricky bits.

1. I was really surprised when you didn't suggest that the receiver should
suggest a new "Content-Location" that incorporated the conv-id into the URL
for receiver assigned.  foo.com/bar?cid=5532.  And sender assigned callbacks
could have been bar.com/foo?cid=2238.  I have reasons for why we didn't do
that, but I thought you were going to suggest it...  That seems like the
most RESTful solution, everything needed to address the conversational
resource in a URI.

2. In general, the problem with your proposal is that it combines the logic
of the application with the logic of the reliability (message ordering) and
conversations.  In my view, we want a separation of concerns, specifically
separating the application from the reliability protocol.  In the REST
model, HTTP is an application transfer protocol, and therefore the
reliability characteristics get built into the application protocol.  That's
fine when people could reload web pages because they "knew" they could.
Another way of looking at this problem is that your proposal makes
reliability an integral part of the application protocol, that is the app
has to know about reliability for every message exchange.  Where we'd like
to have reliability is that it is a characteristic of the message exchange
and separate from the application protocol.  Again, this loose coupling is a
major feature of doing reliability through headers (or even reliable http)
compared to GET/POST.

3. If you decouple the reliability logic from the application logic, you can
get another aspect of loose coupling: that is that most messaging systems
use queues to do reliable asynchronous messaging.  So the client can simply
hand a message to a sender with a queue.  The client can then go away.  On
the server, the component gets invoked by a handler - which usually has a
queue.  This is the basis of incredibly scalable systems, like MQSeries,
JMS, etc.  The principle here is that the act of delivering/receiving the
message can be done by underlying infrastructure, and the app doesn't
know/care how the message gets from a to b.   And the underlying
infrastructure doesn't have to know about the application protocol.  Your
proposal means that the application now has to deal with all the aspects of
reliability, and reliability software has to know about the application.
Tightly coupled.

4. There's a huge missing problem with your reliability proposal: it's
missing all of the information about how to deal with messages that are
transferred, or not transferred.  What's the result of the "GET" against the
conversation ID?   There would need to be a format to specify states, (I've
received msgs with ids x and z, in that order).   I see a whole outcomes
exchange protocol appearing.  So the graph of state transitions/messages
needs to be defined.  On both sides (I've sent msgs with ids x, y, z).  Your
reliablity suggestion means that the components in a conversation would have
to track all the messages they had ever received or sent.  Imagine A sends
x,y,z to B.  Let's say the y gets lost, but A doesn't know it (asynch
request).  A might, or might not, get an error from B as a result of z.  In
the case of an error, A asks B, what did you get? And then B says it got x.
And then A can go "ah, I understand".  In the case of A not getting an error
(say y and z are 0..* cardinality messages), then it's impossible for A to
know that the y got dropped.  It would have to poll to find out, or find
some inconsistent state further down.  And what if A finds out that y got
dropped, but doesn't want to send y as it's already sent z.  It seems like
they actually doing a distributed transaction, in that A and B are trying to
synchronize the states of the messages that they got.  A and B have to know
everything about the allowable and the actual message exchanges, right in
the application protocol.  This doesn't seem simpler than defining an
acknowledgement protocol with varying qualities of service and configurable
parameters.

5. One takeaway point is to think about loose coupling.  A reliable
messaging protocol allows for looser coupling between the application and
the underlying protocols.  I think it's really important to realize that
most reasonable architecture choices optimize for one trade-off over
another.  The argument is often made that REST allows for loosely coupled
systems.  But in some cases, and I think reliability is one case, it
actually makes for more tightly coupled system.  What's important for us to
talk about going forward is which aspects we wish to optimize for.

Cheers,
Dave

> -----Original Message-----
> From: www-ws-arch-request@w3.org [mailto:www-ws-arch-request@w3.org]On
> Behalf Of Paul Prescod
> Sent: Wednesday, July 17, 2002 5:33 PM
> To: David Orchard; www-ws-arch@w3.org
> Subject: REST, Conversations and Reliability
>
>
>
> David offers the following URI:
>
> http://dev2dev.bea.com/techtrack/SOAPConversation.jsp
>
> In my mind, it is a perfect example of a protocol that can be enhanced
> by applying some REST discipline.
>
> The BEA proposal introduces a concept of "ConversationID" which
> represents a conversation. It also introduces a state machine that
> allows the participants to move through the stages from "no
> conversation" to "talking" to "finished conversing". It defines ways
> that headers are used to move through those stages. It also
> defines how
> a callback URI can be presented. It has quite a resemblance
> to the ideas
> in the HTTPEvents draft.
>
> Now let me apply a combination of REST discipline and my own thoughts
> about networking.
>
> Let's call the recipient of the first message the "server" and the
> sender of the first message the "client" although at an HTTP
> level they
> may switch roles if the exchange is asynchronous.
>
> The server needs to deal with N incoming conversations and
> needs to keep
> them all straight. Also, the server by definition has the
> capability to
> host URIs but the client may or may not. For this and other reasons, I
> feel that the conversation ID should be generated by the
> recipient, not
> the sender. Most important: the recipient can trivially generate IDs
> unique to them. The sender can at best use UUIDs to reduce the chances
> of collision.
>
> Second, the conversation ID should be a (surprise!) http URI.
> It should
> point to a conversation resource. Obviously if the conversation is
> necessary to the successful completion of the discussion then it is an
> important resource and deserves a URI. This isn't just theoretically
> clean it is extremely important in practice as will become clear in a
> moment.
>
> Let's think about reliability.
>
> What happens if the conversation-constructing message is lost? That's
> okay. The client can just send it again.
>
> What happens if the conversation-constructing response is lost? That's
> okay. The client can just set up a new conversation resource and the
> server can dispose of the unused one after a timeout.
>
> Now both partners are in the "conversing" state. But the big
> difference
> between the original proposal and the REST proposal is that the REST
> proposal makes this state explicit in terms of universally addressable
> resources.
>
> According to the original proposal, callbacks refer to the
> conversation
> ID. In my proposal, callbacks would also refer to the conversation
> resource. But the conversation resource would be a real
> data-containing
> resource. For instance in an instant-messaging application, the
> conversation resource would list which users are involved with the
> discussion. In an order negotiation application, the conversation
> resource could point to the good being bought or sold. Note that the
> server by definition has access to this information so it is
> just a case
> of giving the information a URI so that it may be looked up at runtime
> by the client or third parties.
>
> This is important for a variety of reasons. First, it means
> that clients
> can be stateless and thus simpler. It means that the client-end of a
> conversation can migrate from one machine to another merely by passing
> the conversation ID URI (and authentication information). It
> means that
> an (authorized) third-party application like a logger, auditor or
> security filter can apprise itself of the full state of the
> conversation
> just by following the URI in the message.
>
> A conversation resource is not in any way tied to any particular
> nodes/endpoints. Once it is set up, dozens or hundreds of participants
> can be involved without any major architectural shift. The third,
> fourth, etc. participants are brought in merely by forwarding them the
> URI. There are no hard-coded roles of "client" and "server" after the
> conversation is set up. There is "the server maintaining the
> conversation" and "everybody else".
>
> Also, stateless presentation tools like XSLT stylesheets can extract
> information for rendering the transmitted message. Assertions can be
> made about conversation resources using RDF. An HTML representation of
> resources can be used for technical support and debugging.
>
> Most important: if the client or other participant misses a message,
> gets state corrupted or otherwise gets confused about the state of the
> conversation, it can refresh itself with a simple GET. That's
> a scalable
> approach to reliability. Under the original protocol, there is no way
> for a confused client that has missed a message to check whether the
> conversation is still ongoing and thus it should expect more messages.
> For instance if the client is momentarily offline, there is no way for
> it to check whether the server timed-out in the meantime.
>
> The original proposal says:
>
> "The ContinueHeader
>        MUST be sent on any messages to operations that are
> marked in the
>        WSDL as requesting a ContinueHeader."
>
> I feel that this is too large grained of a constraint. In some cases,
> conversations will need to nest. For instance there is the
> conversation
> that sets up a shopper/seller relationship and then within that there
> are conversations on the price of individual items. In the REST model
> these would be just different kinds of conversation resources. Some
> operations would expect a reference to a "shopping conversation" and
> some operations would expect a reference to a "product price
> negotiation" conversation. Of course each resource would have
> a link to
> the other so that it is possible to easily go from one to another.
>
> The original proposal says that the conversation is ended in an
> unspecified manner. I do not understand why it would specify
> some things
> and leave that unspecified. Therefore I would say rather that the
> conversation is ended when either party DELETEs the conversation
> resource. There should be some standardized way for the server to
> indicate that it is doing so to a callback-capable client.
> Alternately,
> conversation resources could be immortal (for archival purposes) but
> could have a flag that says whether they are ongoing or historical.
>
> I hope this demonstrates that a REST approach is not at all
> at odds with
> a "named conversation" approach but a REST approach would say:
>
>  1. Conversations should be named as everything else on the Web is
> named, with URIs.
>
>  2. Conversations should be inspectable and introspectable as
> everything
> else on the Web is, through HTTP GET.
>
>  3. Any authorized party (especially confused clients) should
> be able to
> bring itself up to a full understanding of the state of the
> conversation
> by looking at the conversation URI (or things linked to the
> conversation
> URI).
>
>  4. Conversations will almost always have important
> associated data (the
> stuff being talked about) and the resource storing that
> information can
> easily serve as the conversation resource.
> --
> Come discuss XML and REST web services at:
>   Open Source Conference: July 22-26, 2002, conferences.oreillynet.com
>   Extreme Markup: Aug 4-9, 2002,  www.extrememarkup.com/extreme/
>
>
Received on Tuesday, 6 August 2002 16:17:30 UTC