RE: Introducing the Service Oriented Architectural style, and it's constraints and properties. from Assaf Arkin on 2003-02-18 (www-ws-arch@w3.org from February 2003)

From: Assaf Arkin <arkin@intalio.com>
Date: Mon, 17 Feb 2003 20:57:55 -0800
To: "Assaf Arkin" <arkin@intalio.com>, "David Orchard" <dorchard@bea.com>, <www-ws-arch@w3.org>
Message-ID: <IGEJLEPAJBPHKACOOKHNOEGCDDAA.arkin@intalio.com>
I'm going to comment on my comments. All I can say is, a whiteborad would
come in useful right now ;-)

In generic terms I would define a service-oriented architecture as one in
which services perform operations within an identified context given some
input data. In its output a service can provide the same or a derived
context. You can see that in CORBA, COM, SOAP + WS-Security, ebXML
messaging, etc.

The context could be something like a security token, a transaction, a
session, etc. Contexts can overlap at different points in time, e.g. two
operations in the same transactions, two transactions in the same session, a
transaction that involves two operations in different security contexts,
etc.

In abstract terms you would define a service, an operation of that service,
a context (Sanjiva's proposal, correlations) and some input (message body).
You would then define multiple protocol bindings that encode that
information in various ways over the wire.

At the lowest level we can look at IP as being a protocol with ip:port being
the channel and everything else passed in the message. With HTTP we have
more capable channels encoded as URLs which are carried in the message. It
becomes more interesting to look at the URL as the channel rather than the
ip:port portion of it. With SOAP we can further extend the channel to
include additional context information (transaction, security, etc).

But with WSDL we shouldn't make that distinction. We should simply define
the channel as composition of service + operation + context and input/output
for the operation. We can then encode it as TCP, as HTTP over TCP, as SOAP
over HTTP, or have multiple encodings at the same time.

Deciding whether to use HTTP GET or POST is merely a matter of optimization.
If all the information can be captured in a URL and the output can be cached
it makes more sense to use GET than to "fix" caches/proxies to support POST.
If you use HTTP you simply lose caching capability (or you need to fix a lot
of cachese), you are less efficient but still functional. On the other hand,
if the information cannot fit in a URL you can have some GET-based protocol
to get around it(*), or you can use POST more efficiently.

A simple system can be constructed using a low-level model. In that model
channels denoted by service + operation are known a priori, but channels
denoted by service + operation + context must be communicated. Such a simple
system would look like REST if you opted to use HTTP GET in the protocol
bindings, or pi-calculus if you tried to describe it formally.

A more complex system will have to make a decision on where to place
complexity. It could use the low-level model resulting in complexity of the
process, or a high-level model resulting in simplification of the system but
requiring a more complex model. You can reduce processing and message
passing by electing to use the high-level model.

In a high-level model you would allow the sender to construct the channel
given a knowledge of the context, in effect communicating partial channels
(service and context separately) and combining them to form new channels.
After playing with low-level models for a while and realizing the inherit
complexity, this model was selected for use in WSCI and BPML, and if my
understanding is correct also used in BPEL, WS security and other related
specifications.

The model doesn't preclude anyone from building a RESTful system. On the
contrary, if the system is simple to model it will naturally follow the REST
approach. If the system is not so simple then two options exist: use
RESTless interactions (transactions, cookies, POSTs) or rewrite the system
based on simpler forms of interactions but at the cost of increasing
complexity and message passing.

With some limitations we can try this out with WSDL 1.1.

1. Define a lookup operation with WSDL that returns access URLs (e.g.
getStockQuote). Provide two set of protocol bindings, one using HTTP GET and
one using HTTP POST, and see which one is easier to access/cache.

2. Define a complex scenario using high-level non-idempotent
context-carrying operations (e.g. purchase order management) using the
minimal set of operations. Start by using HTTP POST so there are no
restrictions.

3. Reduce no#2 into a constrained model as used for no#1 and see how many
new states/messages are required to get it working.

arkin


* While such a protocol is doable and even provable using process calculus
it increases the number of messages exchanged, the complexity of the system
and further requires both parties to be full-fledged Web services. There are
many cases where having simplicity and assimetry (e.g. browser and server)
works best.


> -----Original Message-----
> From: www-ws-arch-request@w3.org [mailto:www-ws-arch-request@w3.org]On
> Behalf Of Assaf Arkin
> Sent: Monday, February 17, 2003 3:03 PM
> To: David Orchard; www-ws-arch@w3.org
> Subject: RE: Introducing the Service Oriented Architectural style, and
> it's constraints and properties.
>
>
>
> > > I cannot imagine a cache engine that will look at the body of the POST
> > > message to determine what information to cache, so I would
> > > tend to agree
> > > that for caching purposes using GET is a better approach.
> > > However, if my
> > > input message only contains an identification of a resource
> > > (in addition to
> > > identifying service outside the message) and such
> > > identification can be
> > > encoded as a parameter in the URL of the HTTP request, I
> > > could allow a cache
> > > engine (and other technologies) to manage access to that resource.
> > >
> >
> > Are you suggesting a scenario where there's a URI and an
> effectively empty
> > POST request?  The problem is that how can you cache that?
> You'd have to
> > somehow mark the POST as being idempotent, so that the cache
> > would know that
> > it didn't have to get a response from the resource.  That's
> > exactly what GET
> > does.  I think the key part of the GET semantic that's explicit
> in 2616 is
> > that the GET is idempotent, therefore caches can do their mojo.
>
> I am actually making the case for using HTTP GET and not HTTP
> POST for this
> example, but from the perspective of selecting the best protocol bindings
> for a particular protocol. In other words, both POST and GET (and also FTP
> and SMTP) are possible, but given the definition of the operation I select
> the protocol binding that is most efficient to use and in this
> case HTTP GET
> rules.
>
> (Since GET already does what we need, I do not see much need to allow
> caching for POST. Sure, you can use HTTP POST in the protocol binding, it
> will simply be an inefficient choice.)
>
> The selection which protocol to use is rather arbitrary. The
> service-oriented architecture simply describes a service for retrieving a
> message. When it comes time to select GET vs POST, the best practice for
> protocol bindings will direct me to use GET pointing out that POST is
> possible but less efficient (and we all agree on the why).
>
> The way I look at it, a request can identify the service and the
> particular
> (output) message I am interested in. The message may exist prior to my
> request, i.e. the input does not lead to a computation that generates the
> message. And there are several protocols that can retrieve such message
> efficienctly.
>
> (Warning: partial analogy follows) JMS has an interesting feature called
> selectors. When you retrieve a particular message, e.g. stock quote for
> ticker XYZ, you can use a selector as the input to the request,
> identifying
> which message you use. In effect you are performing an operation that
> retrieves an existing message cached by the JMS engine, but in your action
> you are supplying an input to identify that message from all
> other messages
> available from that service.
>
> If I were to write a similar model using HTTP GET (with the Web server
> representing a multi-consumer queue) I would simply switch the
> SQL syntax of
> the JMS selector with a URL encoding that would give me the URL
> by which the
> message can be retrieved with a single HTTP GET.
>
>
> > "Direct manipulation of Resources" instead of "indirection
> manipulation".
> > That is, if you indirectly manipulate a resource, there's too
> > many variable
> > places for the "real" resource/service identifier to be in the
> > message for a
> > proxy to figure out whether it can cache, etc. the representations.
>
> The matrix I use is slightly different. I separate operations into those
> that involve computation and those that do not. I further separate
> computation operations to those that are idempotent and those
> that are not,
> those that are atomic and those that are not, etc.
>
> For now I would use the term 'lookup', but I'm not sure it's such a good
> name. A lookup is something you can easily cache, a computation
> is something
> you cannot easily cache (if you cache it, you cache it in the service not
> the intermediary).
>
> For 'lookup' the input has to be simple so you can encode it in a URL, and
> if you encode it in a URL and use HTTP GET then you can cache it,
> proxy it,
> etc. For computation the input may be simple so you can encode it
> in a URL,
> or more complex. In the later case using HTTP POST would be the most
> efficient means for sending the input.
>
> So the matrix would look like:
>
> oper/
> method | lookup      | computation
> --------------------------------
> GET    | always      | sometimes
> POST   | non-optimal | optimal
>
> If the operation is a lookup I would select GET since it's more efficient,
> if the operation is a computation I would select POST since it's more
> generic. If the operation is a lookup I would not select SMTP,
> and if it's a
> computation I would not select FTP. So the decision which protocol binding
> to use depends on the operation and not vice versa.
>
>
> > > Another issue concerns firewalls. Practically speaking controlling
> > > individual access to URLs at the firewall level is
> > > impractical. Firewalls
> > > only work well when there is a coarse-grain identification of
> > > services, e.g.
> > > a partial path in the URL, or path withour parameters.
> > >
> >
> > Fair enough.  I should probably say something like "Security
> > intermediary".
> > I was thinking of authentication servers as well.
>
> There are two ends to the security spectrum. You can control access to a
> service and you can control access to the entity (warning: this is a
> generalization).
>
> You can easily control access to the service at the front-end (the
> firewall), you can do so even before the message hits the Web service. For
> efficiency you will only look at the service identification not the entire
> URL. You can easily control access to the data at the back-end (the
> database). For efficiency you will look at the entity identifier ignoring
> the service used to access it.
>
> I don't believe you can easily do one with the other.
>
> An authentication service could perform authentication at the
> front-end, but
> that security token will be used at the front-end to limit access to the
> service and at the back-end to limit access to the entity. So the
> interesting point is how you combine two access methods with the service
> identification, resource identification and security token.
>
> I intentionally avoided using the term resource. As Francis just
> pointed out
> the word is overload, both the service and the entity are resources, and
> each can (and should be) identifier by some URI. What I am
> looking for is a
> way to separate the service resource from the entity resource, without
> precluding a combination access resource to exist.
>
>
> > Aha...  So URLs are opaque to the consumer, that is don't make any
> > assumptions.  But URLs can certainly be non-opaque to the URL provider.
> > There are many reasons, particularly partioning of the security
> > realms, and
> > simplicity in application development (like /en and /fr
> > subtrees), for doing
> > this.
>
> Definitely.
>
> However, I think opaque might be understood the wrong way in
> preventing the
> consumer from constructing opaque URLs.
>
> In the case of HTTP GET, the provider tells the consumer how to
> construct a
> request URL that contains the service end-point and the entity identifier.
> It essentially contructs and access resource identifier. The
> consumer has no
> other knowledge about anything inside the URLs. There might be additional
> information there interesting to the provider (security domain, language,
> etc). But the consumer can understand the relation between a service
> resource and an entity resource and that the lifetime of the
> access resource
> may be shorter than the two other resources.
>
>
> > I'm not quite following, I can't quite picture the URLs.  I'm
> working on a
> > few different scenarios to show these two styles, maybe that will
> > help with
> > these - not quite darned ready to post yet..
> >
> > How would the service identification be different that the resource
> > identification?  Taking your first and third scenarios, is the
> difference
> > between these that the service identifier (say the nefarious
> > getStockQuote)
> > is part of the URI in the first case, and not part of the URI in
> > the second
> > case?  The URLs might like something like:
> > /stockservice/getStockQuote?symbol=BEAS and /stockquote?symbol=BEAS.  If
> > this is right, I'm not sure that you can do a cache of the 3rd result,
> > because the service identifier (stockPriceServiceHighPayingCustomer or
> > stockPriceServiceFreebieMoocherCustomers) would have to be inside the
> > message somewhere, so the cache wouldn't know which representation to
> > return.  Or did I put the service identifier in the URI and not the
> > resource? :-)
>
> I would say tns:stockQuote is the service and
> /stockservice/getStockQuote is
> one of its end-points. So in my Web service stack I define a new service
> (tns:stockQuote) and then attach some security policy. The Web
> service stack
> knows all the end-points and protocol details so it can exert
> access control
> on the specific services (directly, using a firewall, allowing
> caching, etc)
> and individually for each protocol (e.g. ACL for HTTP and spam filter for
> SMTP).
>
> The entity identifier is {symbol,BEAS}. Given a service definition with a
> set of end-points and protocol bindings I can construct messages and sent
> them to the service's access URL. I can have an HTTP POST binding
> where the
> input goes inside the SOAP message, I can have HTTP GET binding where the
> input goes inside the URL (as per your example) and I can have
> SMTP binding
> where the input goes in the SMTP subject header.
>
> I am making the separation between service and entity to allow multiple
> end-points and protocols to access the same entity. Let's say I have 1000
> entities (resources) that I store in my database so I can later operate on
> them. The service end-point changes to use HTTP + XML-Sig instead
> of HTTPS.
> If I encode the actual URL in the database I need a way to change all 1000
> URLs in the database. And since we agree these URLs are opaque, I have no
> way of doing that.
>
> On the other hand, if I encode service ID + entity ID, then I
> create the URL
> before accessing the actual resource over a specific end-point.
> There is one
> service definition to change and once I change it, all future
> access to the
> entity will use the new end-point/protocol.
>
> arkin
>
> >
> > Cheers,
> > Dave
Received on Tuesday, 18 February 2003 00:03:11 UTC