RE: Introducing the Service Oriented Architectural style, and it's constraints and properties. from Assaf Arkin on 2003-02-17 (www-ws-arch@w3.org from February 2003)

From: Assaf Arkin <arkin@intalio.com>
Date: Mon, 17 Feb 2003 15:03:13 -0800
To: "David Orchard" <dorchard@bea.com>, <www-ws-arch@w3.org>
Message-ID: <IGEJLEPAJBPHKACOOKHNKEFKDDAA.arkin@intalio.com>
> > I cannot imagine a cache engine that will look at the body of the POST
> > message to determine what information to cache, so I would
> > tend to agree
> > that for caching purposes using GET is a better approach.
> > However, if my
> > input message only contains an identification of a resource
> > (in addition to
> > identifying service outside the message) and such
> > identification can be
> > encoded as a parameter in the URL of the HTTP request, I
> > could allow a cache
> > engine (and other technologies) to manage access to that resource.
> >
>
> Are you suggesting a scenario where there's a URI and an effectively empty
> POST request?  The problem is that how can you cache that?  You'd have to
> somehow mark the POST as being idempotent, so that the cache
> would know that
> it didn't have to get a response from the resource.  That's
> exactly what GET
> does.  I think the key part of the GET semantic that's explicit in 2616 is
> that the GET is idempotent, therefore caches can do their mojo.

I am actually making the case for using HTTP GET and not HTTP POST for this
example, but from the perspective of selecting the best protocol bindings
for a particular protocol. In other words, both POST and GET (and also FTP
and SMTP) are possible, but given the definition of the operation I select
the protocol binding that is most efficient to use and in this case HTTP GET
rules.

(Since GET already does what we need, I do not see much need to allow
caching for POST. Sure, you can use HTTP POST in the protocol binding, it
will simply be an inefficient choice.)

The selection which protocol to use is rather arbitrary. The
service-oriented architecture simply describes a service for retrieving a
message. When it comes time to select GET vs POST, the best practice for
protocol bindings will direct me to use GET pointing out that POST is
possible but less efficient (and we all agree on the why).

The way I look at it, a request can identify the service and the particular
(output) message I am interested in. The message may exist prior to my
request, i.e. the input does not lead to a computation that generates the
message. And there are several protocols that can retrieve such message
efficienctly.

(Warning: partial analogy follows) JMS has an interesting feature called
selectors. When you retrieve a particular message, e.g. stock quote for
ticker XYZ, you can use a selector as the input to the request, identifying
which message you use. In effect you are performing an operation that
retrieves an existing message cached by the JMS engine, but in your action
you are supplying an input to identify that message from all other messages
available from that service.

If I were to write a similar model using HTTP GET (with the Web server
representing a multi-consumer queue) I would simply switch the SQL syntax of
the JMS selector with a URL encoding that would give me the URL by which the
message can be retrieved with a single HTTP GET.


> "Direct manipulation of Resources" instead of "indirection manipulation".
> That is, if you indirectly manipulate a resource, there's too
> many variable
> places for the "real" resource/service identifier to be in the
> message for a
> proxy to figure out whether it can cache, etc. the representations.

The matrix I use is slightly different. I separate operations into those
that involve computation and those that do not. I further separate
computation operations to those that are idempotent and those that are not,
those that are atomic and those that are not, etc.

For now I would use the term 'lookup', but I'm not sure it's such a good
name. A lookup is something you can easily cache, a computation is something
you cannot easily cache (if you cache it, you cache it in the service not
the intermediary).

For 'lookup' the input has to be simple so you can encode it in a URL, and
if you encode it in a URL and use HTTP GET then you can cache it, proxy it,
etc. For computation the input may be simple so you can encode it in a URL,
or more complex. In the later case using HTTP POST would be the most
efficient means for sending the input.

So the matrix would look like:

oper/
method | lookup      | computation
--------------------------------
GET    | always      | sometimes
POST   | non-optimal | optimal

If the operation is a lookup I would select GET since it's more efficient,
if the operation is a computation I would select POST since it's more
generic. If the operation is a lookup I would not select SMTP, and if it's a
computation I would not select FTP. So the decision which protocol binding
to use depends on the operation and not vice versa.


> > Another issue concerns firewalls. Practically speaking controlling
> > individual access to URLs at the firewall level is
> > impractical. Firewalls
> > only work well when there is a coarse-grain identification of
> > services, e.g.
> > a partial path in the URL, or path withour parameters.
> >
>
> Fair enough.  I should probably say something like "Security
> intermediary".
> I was thinking of authentication servers as well.

There are two ends to the security spectrum. You can control access to a
service and you can control access to the entity (warning: this is a
generalization).

You can easily control access to the service at the front-end (the
firewall), you can do so even before the message hits the Web service. For
efficiency you will only look at the service identification not the entire
URL. You can easily control access to the data at the back-end (the
database). For efficiency you will look at the entity identifier ignoring
the service used to access it.

I don't believe you can easily do one with the other.

An authentication service could perform authentication at the front-end, but
that security token will be used at the front-end to limit access to the
service and at the back-end to limit access to the entity. So the
interesting point is how you combine two access methods with the service
identification, resource identification and security token.

I intentionally avoided using the term resource. As Francis just pointed out
the word is overload, both the service and the entity are resources, and
each can (and should be) identifier by some URI. What I am looking for is a
way to separate the service resource from the entity resource, without
precluding a combination access resource to exist.


> Aha...  So URLs are opaque to the consumer, that is don't make any
> assumptions.  But URLs can certainly be non-opaque to the URL provider.
> There are many reasons, particularly partioning of the security
> realms, and
> simplicity in application development (like /en and /fr
> subtrees), for doing
> this.

Definitely.

However, I think opaque might be understood the wrong way in preventing the
consumer from constructing opaque URLs.

In the case of HTTP GET, the provider tells the consumer how to construct a
request URL that contains the service end-point and the entity identifier.
It essentially contructs and access resource identifier. The consumer has no
other knowledge about anything inside the URLs. There might be additional
information there interesting to the provider (security domain, language,
etc). But the consumer can understand the relation between a service
resource and an entity resource and that the lifetime of the access resource
may be shorter than the two other resources.


> I'm not quite following, I can't quite picture the URLs.  I'm working on a
> few different scenarios to show these two styles, maybe that will
> help with
> these - not quite darned ready to post yet..
>
> How would the service identification be different that the resource
> identification?  Taking your first and third scenarios, is the difference
> between these that the service identifier (say the nefarious
> getStockQuote)
> is part of the URI in the first case, and not part of the URI in
> the second
> case?  The URLs might like something like:
> /stockservice/getStockQuote?symbol=BEAS and /stockquote?symbol=BEAS.  If
> this is right, I'm not sure that you can do a cache of the 3rd result,
> because the service identifier (stockPriceServiceHighPayingCustomer or
> stockPriceServiceFreebieMoocherCustomers) would have to be inside the
> message somewhere, so the cache wouldn't know which representation to
> return.  Or did I put the service identifier in the URI and not the
> resource? :-)

I would say tns:stockQuote is the service and /stockservice/getStockQuote is
one of its end-points. So in my Web service stack I define a new service
(tns:stockQuote) and then attach some security policy. The Web service stack
knows all the end-points and protocol details so it can exert access control
on the specific services (directly, using a firewall, allowing caching, etc)
and individually for each protocol (e.g. ACL for HTTP and spam filter for
SMTP).

The entity identifier is {symbol,BEAS}. Given a service definition with a
set of end-points and protocol bindings I can construct messages and sent
them to the service's access URL. I can have an HTTP POST binding where the
input goes inside the SOAP message, I can have HTTP GET binding where the
input goes inside the URL (as per your example) and I can have SMTP binding
where the input goes in the SMTP subject header.

I am making the separation between service and entity to allow multiple
end-points and protocols to access the same entity. Let's say I have 1000
entities (resources) that I store in my database so I can later operate on
them. The service end-point changes to use HTTP + XML-Sig instead of HTTPS.
If I encode the actual URL in the database I need a way to change all 1000
URLs in the database. And since we agree these URLs are opaque, I have no
way of doing that.

On the other hand, if I encode service ID + entity ID, then I create the URL
before accessing the actual resource over a specific end-point. There is one
service definition to change and once I change it, all future access to the
entity will use the new end-point/protocol.

arkin

>
> Cheers,
> Dave
Received on Monday, 17 February 2003 18:05:26 UTC