RE: Introducing the Service Oriented Architectural style, and it's constraints and properties. from David Orchard on 2003-02-18 (www-ws-arch@w3.org from February 2003)

From: David Orchard <dorchard@bea.com>
Date: Mon, 17 Feb 2003 21:18:37 -0800
To: "'Assaf Arkin'" <arkin@intalio.com>, <www-ws-arch@w3.org>
Message-ID: <000101c2d70d$314a1300$f10ba8c0@beasys.com>
Assaf,

I won't have a chance to respond in depth to this for a number of days - my
last email response to you took me almost 2 hours to write and I'm way
booked up this week.  As I remember from earlier WSCI days, you can pump out
the text :-) This looks like very dense and interesting stuff.  Perhaps
being a bit too blunt, what's the point?  Are we disagreeing?  Agreeing?
Are you proposing your 2nd paragraph etc. as the text for our ws-arch
document, instead of what I proposed?  If so, what are the properties that
it exhibits, and when is it better or worse than other architectures?  I
think you catch my drift.  It looks like very interesting stuff - I actually
quite like the notion of shared contexts, though I'm not sure that is
central to SOA vs anything else - but I'm just not sure what to do with it
wrt our working group's deliverables and what I've proposed.

Cheers,
Dave

> -----Original Message-----
> From: www-ws-arch-request@w3.org [mailto:www-ws-arch-request@w3.org]On
> Behalf Of Assaf Arkin
> Sent: Monday, February 17, 2003 8:58 PM
> To: Assaf Arkin; David Orchard; www-ws-arch@w3.org
> Subject: RE: Introducing the Service Oriented Architectural style, and
> it's constraints and properties.
>
>
>
> I'm going to comment on my comments. All I can say is, a
> whiteborad would
> come in useful right now ;-)
>
> In generic terms I would define a service-oriented
> architecture as one in
> which services perform operations within an identified
> context given some
> input data. In its output a service can provide the same or a derived
> context. You can see that in CORBA, COM, SOAP + WS-Security, ebXML
> messaging, etc.
>
> The context could be something like a security token, a transaction, a
> session, etc. Contexts can overlap at different points in
> time, e.g. two
> operations in the same transactions, two transactions in the
> same session, a
> transaction that involves two operations in different
> security contexts,
> etc.
>
> In abstract terms you would define a service, an operation of
> that service,
> a context (Sanjiva's proposal, correlations) and some input
> (message body).
> You would then define multiple protocol bindings that encode that
> information in various ways over the wire.
>
> At the lowest level we can look at IP as being a protocol
> with ip:port being
> the channel and everything else passed in the message. With
> HTTP we have
> more capable channels encoded as URLs which are carried in
> the message. It
> becomes more interesting to look at the URL as the channel
> rather than the
> ip:port portion of it. With SOAP we can further extend the channel to
> include additional context information (transaction, security, etc).
>
> But with WSDL we shouldn't make that distinction. We should
> simply define
> the channel as composition of service + operation + context
> and input/output
> for the operation. We can then encode it as TCP, as HTTP over
> TCP, as SOAP
> over HTTP, or have multiple encodings at the same time.
>
> Deciding whether to use HTTP GET or POST is merely a matter
> of optimization.
> If all the information can be captured in a URL and the
> output can be cached
> it makes more sense to use GET than to "fix" caches/proxies
> to support POST.
> If you use HTTP you simply lose caching capability (or you
> need to fix a lot
> of cachese), you are less efficient but still functional. On
> the other hand,
> if the information cannot fit in a URL you can have some
> GET-based protocol
> to get around it(*), or you can use POST more efficiently.
>
> A simple system can be constructed using a low-level model.
> In that model
> channels denoted by service + operation are known a priori,
> but channels
> denoted by service + operation + context must be
> communicated. Such a simple
> system would look like REST if you opted to use HTTP GET in
> the protocol
> bindings, or pi-calculus if you tried to describe it formally.
>
> A more complex system will have to make a decision on where to place
> complexity. It could use the low-level model resulting in
> complexity of the
> process, or a high-level model resulting in simplification of
> the system but
> requiring a more complex model. You can reduce processing and message
> passing by electing to use the high-level model.
>
> In a high-level model you would allow the sender to construct
> the channel
> given a knowledge of the context, in effect communicating
> partial channels
> (service and context separately) and combining them to form
> new channels.
> After playing with low-level models for a while and realizing
> the inherit
> complexity, this model was selected for use in WSCI and BPML,
> and if my
> understanding is correct also used in BPEL, WS security and
> other related
> specifications.
>
> The model doesn't preclude anyone from building a RESTful
> system. On the
> contrary, if the system is simple to model it will naturally
> follow the REST
> approach. If the system is not so simple then two options exist: use
> RESTless interactions (transactions, cookies, POSTs) or
> rewrite the system
> based on simpler forms of interactions but at the cost of increasing
> complexity and message passing.
>
> With some limitations we can try this out with WSDL 1.1.
>
> 1. Define a lookup operation with WSDL that returns access URLs (e.g.
> getStockQuote). Provide two set of protocol bindings, one
> using HTTP GET and
> one using HTTP POST, and see which one is easier to access/cache.
>
> 2. Define a complex scenario using high-level non-idempotent
> context-carrying operations (e.g. purchase order management) using the
> minimal set of operations. Start by using HTTP POST so there are no
> restrictions.
>
> 3. Reduce no#2 into a constrained model as used for no#1 and
> see how many
> new states/messages are required to get it working.
>
> arkin
>
>
> * While such a protocol is doable and even provable using
> process calculus
> it increases the number of messages exchanged, the complexity
> of the system
> and further requires both parties to be full-fledged Web
> services. There are
> many cases where having simplicity and assimetry (e.g.
> browser and server)
> works best.
>
>
> > -----Original Message-----
> > From: www-ws-arch-request@w3.org
> [mailto:www-ws-arch-request@w3.org]On
> > Behalf Of Assaf Arkin
> > Sent: Monday, February 17, 2003 3:03 PM
> > To: David Orchard; www-ws-arch@w3.org
> > Subject: RE: Introducing the Service Oriented Architectural
> style, and
> > it's constraints and properties.
> >
> >
> >
> > > > I cannot imagine a cache engine that will look at the
> body of the POST
> > > > message to determine what information to cache, so I would
> > > > tend to agree
> > > > that for caching purposes using GET is a better approach.
> > > > However, if my
> > > > input message only contains an identification of a resource
> > > > (in addition to
> > > > identifying service outside the message) and such
> > > > identification can be
> > > > encoded as a parameter in the URL of the HTTP request, I
> > > > could allow a cache
> > > > engine (and other technologies) to manage access to
> that resource.
> > > >
> > >
> > > Are you suggesting a scenario where there's a URI and an
> > effectively empty
> > > POST request?  The problem is that how can you cache that?
> > You'd have to
> > > somehow mark the POST as being idempotent, so that the cache
> > > would know that
> > > it didn't have to get a response from the resource.  That's
> > > exactly what GET
> > > does.  I think the key part of the GET semantic that's explicit
> > in 2616 is
> > > that the GET is idempotent, therefore caches can do their mojo.
> >
> > I am actually making the case for using HTTP GET and not HTTP
> > POST for this
> > example, but from the perspective of selecting the best
> protocol bindings
> > for a particular protocol. In other words, both POST and
> GET (and also FTP
> > and SMTP) are possible, but given the definition of the
> operation I select
> > the protocol binding that is most efficient to use and in this
> > case HTTP GET
> > rules.
> >
> > (Since GET already does what we need, I do not see much
> need to allow
> > caching for POST. Sure, you can use HTTP POST in the
> protocol binding, it
> > will simply be an inefficient choice.)
> >
> > The selection which protocol to use is rather arbitrary. The
> > service-oriented architecture simply describes a service
> for retrieving a
> > message. When it comes time to select GET vs POST, the best
> practice for
> > protocol bindings will direct me to use GET pointing out
> that POST is
> > possible but less efficient (and we all agree on the why).
> >
> > The way I look at it, a request can identify the service and the
> > particular
> > (output) message I am interested in. The message may exist
> prior to my
> > request, i.e. the input does not lead to a computation that
> generates the
> > message. And there are several protocols that can retrieve
> such message
> > efficienctly.
> >
> > (Warning: partial analogy follows) JMS has an interesting
> feature called
> > selectors. When you retrieve a particular message, e.g.
> stock quote for
> > ticker XYZ, you can use a selector as the input to the request,
> > identifying
> > which message you use. In effect you are performing an
> operation that
> > retrieves an existing message cached by the JMS engine, but
> in your action
> > you are supplying an input to identify that message from all
> > other messages
> > available from that service.
> >
> > If I were to write a similar model using HTTP GET (with the
> Web server
> > representing a multi-consumer queue) I would simply switch the
> > SQL syntax of
> > the JMS selector with a URL encoding that would give me the URL
> > by which the
> > message can be retrieved with a single HTTP GET.
> >
> >
> > > "Direct manipulation of Resources" instead of "indirection
> > manipulation".
> > > That is, if you indirectly manipulate a resource, there's too
> > > many variable
> > > places for the "real" resource/service identifier to be in the
> > > message for a
> > > proxy to figure out whether it can cache, etc. the
> representations.
> >
> > The matrix I use is slightly different. I separate
> operations into those
> > that involve computation and those that do not. I further separate
> > computation operations to those that are idempotent and those
> > that are not,
> > those that are atomic and those that are not, etc.
> >
> > For now I would use the term 'lookup', but I'm not sure
> it's such a good
> > name. A lookup is something you can easily cache, a computation
> > is something
> > you cannot easily cache (if you cache it, you cache it in
> the service not
> > the intermediary).
> >
> > For 'lookup' the input has to be simple so you can encode
> it in a URL, and
> > if you encode it in a URL and use HTTP GET then you can cache it,
> > proxy it,
> > etc. For computation the input may be simple so you can encode it
> > in a URL,
> > or more complex. In the later case using HTTP POST would be the most
> > efficient means for sending the input.
> >
> > So the matrix would look like:
> >
> > oper/
> > method | lookup      | computation
> > --------------------------------
> > GET    | always      | sometimes
> > POST   | non-optimal | optimal
> >
> > If the operation is a lookup I would select GET since it's
> more efficient,
> > if the operation is a computation I would select POST since
> it's more
> > generic. If the operation is a lookup I would not select SMTP,
> > and if it's a
> > computation I would not select FTP. So the decision which
> protocol binding
> > to use depends on the operation and not vice versa.
> >
> >
> > > > Another issue concerns firewalls. Practically speaking
> controlling
> > > > individual access to URLs at the firewall level is
> > > > impractical. Firewalls
> > > > only work well when there is a coarse-grain identification of
> > > > services, e.g.
> > > > a partial path in the URL, or path withour parameters.
> > > >
> > >
> > > Fair enough.  I should probably say something like "Security
> > > intermediary".
> > > I was thinking of authentication servers as well.
> >
> > There are two ends to the security spectrum. You can
> control access to a
> > service and you can control access to the entity (warning: this is a
> > generalization).
> >
> > You can easily control access to the service at the front-end (the
> > firewall), you can do so even before the message hits the
> Web service. For
> > efficiency you will only look at the service identification
> not the entire
> > URL. You can easily control access to the data at the back-end (the
> > database). For efficiency you will look at the entity
> identifier ignoring
> > the service used to access it.
> >
> > I don't believe you can easily do one with the other.
> >
> > An authentication service could perform authentication at the
> > front-end, but
> > that security token will be used at the front-end to limit
> access to the
> > service and at the back-end to limit access to the entity. So the
> > interesting point is how you combine two access methods
> with the service
> > identification, resource identification and security token.
> >
> > I intentionally avoided using the term resource. As Francis just
> > pointed out
> > the word is overload, both the service and the entity are
> resources, and
> > each can (and should be) identifier by some URI. What I am
> > looking for is a
> > way to separate the service resource from the entity
> resource, without
> > precluding a combination access resource to exist.
> >
> >
> > > Aha...  So URLs are opaque to the consumer, that is don't make any
> > > assumptions.  But URLs can certainly be non-opaque to the
> URL provider.
> > > There are many reasons, particularly partioning of the security
> > > realms, and
> > > simplicity in application development (like /en and /fr
> > > subtrees), for doing
> > > this.
> >
> > Definitely.
> >
> > However, I think opaque might be understood the wrong way in
> > preventing the
> > consumer from constructing opaque URLs.
> >
> > In the case of HTTP GET, the provider tells the consumer how to
> > construct a
> > request URL that contains the service end-point and the
> entity identifier.
> > It essentially contructs and access resource identifier. The
> > consumer has no
> > other knowledge about anything inside the URLs. There might
> be additional
> > information there interesting to the provider (security
> domain, language,
> > etc). But the consumer can understand the relation between a service
> > resource and an entity resource and that the lifetime of the
> > access resource
> > may be shorter than the two other resources.
> >
> >
> > > I'm not quite following, I can't quite picture the URLs.  I'm
> > working on a
> > > few different scenarios to show these two styles, maybe that will
> > > help with
> > > these - not quite darned ready to post yet..
> > >
> > > How would the service identification be different that
> the resource
> > > identification?  Taking your first and third scenarios, is the
> > difference
> > > between these that the service identifier (say the nefarious
> > > getStockQuote)
> > > is part of the URI in the first case, and not part of the URI in
> > > the second
> > > case?  The URLs might like something like:
> > > /stockservice/getStockQuote?symbol=BEAS and
> /stockquote?symbol=BEAS.  If
> > > this is right, I'm not sure that you can do a cache of
> the 3rd result,
> > > because the service identifier
> (stockPriceServiceHighPayingCustomer or
> > > stockPriceServiceFreebieMoocherCustomers) would have to
> be inside the
> > > message somewhere, so the cache wouldn't know which
> representation to
> > > return.  Or did I put the service identifier in the URI
> and not the
> > > resource? :-)
> >
> > I would say tns:stockQuote is the service and
> > /stockservice/getStockQuote is
> > one of its end-points. So in my Web service stack I define
> a new service
> > (tns:stockQuote) and then attach some security policy. The Web
> > service stack
> > knows all the end-points and protocol details so it can exert
> > access control
> > on the specific services (directly, using a firewall, allowing
> > caching, etc)
> > and individually for each protocol (e.g. ACL for HTTP and
> spam filter for
> > SMTP).
> >
> > The entity identifier is {symbol,BEAS}. Given a service
> definition with a
> > set of end-points and protocol bindings I can construct
> messages and sent
> > them to the service's access URL. I can have an HTTP POST binding
> > where the
> > input goes inside the SOAP message, I can have HTTP GET
> binding where the
> > input goes inside the URL (as per your example) and I can have
> > SMTP binding
> > where the input goes in the SMTP subject header.
> >
> > I am making the separation between service and entity to
> allow multiple
> > end-points and protocols to access the same entity. Let's
> say I have 1000
> > entities (resources) that I store in my database so I can
> later operate on
> > them. The service end-point changes to use HTTP + XML-Sig instead
> > of HTTPS.
> > If I encode the actual URL in the database I need a way to
> change all 1000
> > URLs in the database. And since we agree these URLs are
> opaque, I have no
> > way of doing that.
> >
> > On the other hand, if I encode service ID + entity ID, then I
> > create the URL
> > before accessing the actual resource over a specific end-point.
> > There is one
> > service definition to change and once I change it, all future
> > access to the
> > entity will use the new end-point/protocol.
> >
> > arkin
> >
> > >
> > > Cheers,
> > > Dave
>
>
Received on Tuesday, 18 February 2003 00:21:15 UTC