RE: Introducing the Service Oriented Architectural style, and it's constraints and properties. from Assaf Arkin on 2003-02-18 (www-ws-arch@w3.org from February 2003)

From: Assaf Arkin <arkin@intalio.com>
Date: Tue, 18 Feb 2003 13:47:24 -0800
To: "David Orchard" <dorchard@bea.com>, <www-ws-arch@w3.org>
Message-ID: <IGEJLEPAJBPHKACOOKHNAEGODDAA.arkin@intalio.com>
> -----Original Message-----
> From: David Orchard [mailto:dorchard@bea.com]
> Sent: Monday, February 17, 2003 9:19 PM
> To: 'Assaf Arkin'; www-ws-arch@w3.org
> Subject: RE: Introducing the Service Oriented Architectural style, and
> it's constraints and properties.
>
>
> Assaf,
>
> I won't have a chance to respond in depth to this for a number of
> days - my
> last email response to you took me almost 2 hours to write and I'm way
> booked up this week.  As I remember from earlier WSCI days, you
> can pump out
> the text :-)

Sorry about the verbosity ;-)

I've been trying to look at this from the perspective of implementation as
well as formal models, and I can't see why SOA is antagonist to REST, or why
WS would preclude the existence of REST services.


> This looks like very dense and interesting stuff.  Perhaps
> being a bit too blunt, what's the point?  Are we disagreeing?  Agreeing?

I basically agree with all the points you made, but I'm not sure the SOA vs
REST comparison is accurate.

The short answer is: some systems fit within the REST problem space, and
nothing in the SOA precludes them from adhering to the REST constraints.

It is not wise to enforce the REST constraints in all cases (my regression
into process calculus is just a way to formalize this). SOA should allow
other systems to be defined that don't fit within the constraints of REST.

I think a better comparison would be to look at how one can build a RESTful
Web service to prove that it is in fact possible, and list the constraints
of REST to indicate why SOA is preferred in allowing more types of services
to exist.


> Are you proposing your 2nd paragraph etc. as the text for our ws-arch
> document, instead of what I proposed?  If so, what are the properties that
> it exhibits, and when is it better or worse than other architectures?  I
> think you catch my drift.  It looks like very interesting stuff -
> I actually
> quite like the notion of shared contexts, though I'm not sure that is
> central to SOA vs anything else - but I'm just not sure what to do with it
> wrt our working group's deliverables and what I've proposed.

We are talking a lot about stateless Web services as being more scalable,
and at the same breath about long lasting interactions. The notion of
context bridges the gap between the stateless service and the long lasting
interaction (a context that spans multiple operations).

What contexts bring to the table is the ability to define stateless services
that participates in some long-lasting interaction in very abstract terms.
At the SOA level we only talk about abstract contexts, while other
specifications (conversation, transaction, choreography, trust) define
specific contexts.

arkin

>
> Cheers,
> Dave
>
> > -----Original Message-----
> > From: www-ws-arch-request@w3.org [mailto:www-ws-arch-request@w3.org]On
> > Behalf Of Assaf Arkin
> > Sent: Monday, February 17, 2003 8:58 PM
> > To: Assaf Arkin; David Orchard; www-ws-arch@w3.org
> > Subject: RE: Introducing the Service Oriented Architectural style, and
> > it's constraints and properties.
> >
> >
> >
> > I'm going to comment on my comments. All I can say is, a
> > whiteborad would
> > come in useful right now ;-)
> >
> > In generic terms I would define a service-oriented
> > architecture as one in
> > which services perform operations within an identified
> > context given some
> > input data. In its output a service can provide the same or a derived
> > context. You can see that in CORBA, COM, SOAP + WS-Security, ebXML
> > messaging, etc.
> >
> > The context could be something like a security token, a transaction, a
> > session, etc. Contexts can overlap at different points in
> > time, e.g. two
> > operations in the same transactions, two transactions in the
> > same session, a
> > transaction that involves two operations in different
> > security contexts,
> > etc.
> >
> > In abstract terms you would define a service, an operation of
> > that service,
> > a context (Sanjiva's proposal, correlations) and some input
> > (message body).
> > You would then define multiple protocol bindings that encode that
> > information in various ways over the wire.
> >
> > At the lowest level we can look at IP as being a protocol
> > with ip:port being
> > the channel and everything else passed in the message. With
> > HTTP we have
> > more capable channels encoded as URLs which are carried in
> > the message. It
> > becomes more interesting to look at the URL as the channel
> > rather than the
> > ip:port portion of it. With SOAP we can further extend the channel to
> > include additional context information (transaction, security, etc).
> >
> > But with WSDL we shouldn't make that distinction. We should
> > simply define
> > the channel as composition of service + operation + context
> > and input/output
> > for the operation. We can then encode it as TCP, as HTTP over
> > TCP, as SOAP
> > over HTTP, or have multiple encodings at the same time.
> >
> > Deciding whether to use HTTP GET or POST is merely a matter
> > of optimization.
> > If all the information can be captured in a URL and the
> > output can be cached
> > it makes more sense to use GET than to "fix" caches/proxies
> > to support POST.
> > If you use HTTP you simply lose caching capability (or you
> > need to fix a lot
> > of cachese), you are less efficient but still functional. On
> > the other hand,
> > if the information cannot fit in a URL you can have some
> > GET-based protocol
> > to get around it(*), or you can use POST more efficiently.
> >
> > A simple system can be constructed using a low-level model.
> > In that model
> > channels denoted by service + operation are known a priori,
> > but channels
> > denoted by service + operation + context must be
> > communicated. Such a simple
> > system would look like REST if you opted to use HTTP GET in
> > the protocol
> > bindings, or pi-calculus if you tried to describe it formally.
> >
> > A more complex system will have to make a decision on where to place
> > complexity. It could use the low-level model resulting in
> > complexity of the
> > process, or a high-level model resulting in simplification of
> > the system but
> > requiring a more complex model. You can reduce processing and message
> > passing by electing to use the high-level model.
> >
> > In a high-level model you would allow the sender to construct
> > the channel
> > given a knowledge of the context, in effect communicating
> > partial channels
> > (service and context separately) and combining them to form
> > new channels.
> > After playing with low-level models for a while and realizing
> > the inherit
> > complexity, this model was selected for use in WSCI and BPML,
> > and if my
> > understanding is correct also used in BPEL, WS security and
> > other related
> > specifications.
> >
> > The model doesn't preclude anyone from building a RESTful
> > system. On the
> > contrary, if the system is simple to model it will naturally
> > follow the REST
> > approach. If the system is not so simple then two options exist: use
> > RESTless interactions (transactions, cookies, POSTs) or
> > rewrite the system
> > based on simpler forms of interactions but at the cost of increasing
> > complexity and message passing.
> >
> > With some limitations we can try this out with WSDL 1.1.
> >
> > 1. Define a lookup operation with WSDL that returns access URLs (e.g.
> > getStockQuote). Provide two set of protocol bindings, one
> > using HTTP GET and
> > one using HTTP POST, and see which one is easier to access/cache.
> >
> > 2. Define a complex scenario using high-level non-idempotent
> > context-carrying operations (e.g. purchase order management) using the
> > minimal set of operations. Start by using HTTP POST so there are no
> > restrictions.
> >
> > 3. Reduce no#2 into a constrained model as used for no#1 and
> > see how many
> > new states/messages are required to get it working.
> >
> > arkin
> >
> >
> > * While such a protocol is doable and even provable using
> > process calculus
> > it increases the number of messages exchanged, the complexity
> > of the system
> > and further requires both parties to be full-fledged Web
> > services. There are
> > many cases where having simplicity and assimetry (e.g.
> > browser and server)
> > works best.
> >
> >
> > > -----Original Message-----
> > > From: www-ws-arch-request@w3.org
> > [mailto:www-ws-arch-request@w3.org]On
> > > Behalf Of Assaf Arkin
> > > Sent: Monday, February 17, 2003 3:03 PM
> > > To: David Orchard; www-ws-arch@w3.org
> > > Subject: RE: Introducing the Service Oriented Architectural
> > style, and
> > > it's constraints and properties.
> > >
> > >
> > >
> > > > > I cannot imagine a cache engine that will look at the
> > body of the POST
> > > > > message to determine what information to cache, so I would
> > > > > tend to agree
> > > > > that for caching purposes using GET is a better approach.
> > > > > However, if my
> > > > > input message only contains an identification of a resource
> > > > > (in addition to
> > > > > identifying service outside the message) and such
> > > > > identification can be
> > > > > encoded as a parameter in the URL of the HTTP request, I
> > > > > could allow a cache
> > > > > engine (and other technologies) to manage access to
> > that resource.
> > > > >
> > > >
> > > > Are you suggesting a scenario where there's a URI and an
> > > effectively empty
> > > > POST request?  The problem is that how can you cache that?
> > > You'd have to
> > > > somehow mark the POST as being idempotent, so that the cache
> > > > would know that
> > > > it didn't have to get a response from the resource.  That's
> > > > exactly what GET
> > > > does.  I think the key part of the GET semantic that's explicit
> > > in 2616 is
> > > > that the GET is idempotent, therefore caches can do their mojo.
> > >
> > > I am actually making the case for using HTTP GET and not HTTP
> > > POST for this
> > > example, but from the perspective of selecting the best
> > protocol bindings
> > > for a particular protocol. In other words, both POST and
> > GET (and also FTP
> > > and SMTP) are possible, but given the definition of the
> > operation I select
> > > the protocol binding that is most efficient to use and in this
> > > case HTTP GET
> > > rules.
> > >
> > > (Since GET already does what we need, I do not see much
> > need to allow
> > > caching for POST. Sure, you can use HTTP POST in the
> > protocol binding, it
> > > will simply be an inefficient choice.)
> > >
> > > The selection which protocol to use is rather arbitrary. The
> > > service-oriented architecture simply describes a service
> > for retrieving a
> > > message. When it comes time to select GET vs POST, the best
> > practice for
> > > protocol bindings will direct me to use GET pointing out
> > that POST is
> > > possible but less efficient (and we all agree on the why).
> > >
> > > The way I look at it, a request can identify the service and the
> > > particular
> > > (output) message I am interested in. The message may exist
> > prior to my
> > > request, i.e. the input does not lead to a computation that
> > generates the
> > > message. And there are several protocols that can retrieve
> > such message
> > > efficienctly.
> > >
> > > (Warning: partial analogy follows) JMS has an interesting
> > feature called
> > > selectors. When you retrieve a particular message, e.g.
> > stock quote for
> > > ticker XYZ, you can use a selector as the input to the request,
> > > identifying
> > > which message you use. In effect you are performing an
> > operation that
> > > retrieves an existing message cached by the JMS engine, but
> > in your action
> > > you are supplying an input to identify that message from all
> > > other messages
> > > available from that service.
> > >
> > > If I were to write a similar model using HTTP GET (with the
> > Web server
> > > representing a multi-consumer queue) I would simply switch the
> > > SQL syntax of
> > > the JMS selector with a URL encoding that would give me the URL
> > > by which the
> > > message can be retrieved with a single HTTP GET.
> > >
> > >
> > > > "Direct manipulation of Resources" instead of "indirection
> > > manipulation".
> > > > That is, if you indirectly manipulate a resource, there's too
> > > > many variable
> > > > places for the "real" resource/service identifier to be in the
> > > > message for a
> > > > proxy to figure out whether it can cache, etc. the
> > representations.
> > >
> > > The matrix I use is slightly different. I separate
> > operations into those
> > > that involve computation and those that do not. I further separate
> > > computation operations to those that are idempotent and those
> > > that are not,
> > > those that are atomic and those that are not, etc.
> > >
> > > For now I would use the term 'lookup', but I'm not sure
> > it's such a good
> > > name. A lookup is something you can easily cache, a computation
> > > is something
> > > you cannot easily cache (if you cache it, you cache it in
> > the service not
> > > the intermediary).
> > >
> > > For 'lookup' the input has to be simple so you can encode
> > it in a URL, and
> > > if you encode it in a URL and use HTTP GET then you can cache it,
> > > proxy it,
> > > etc. For computation the input may be simple so you can encode it
> > > in a URL,
> > > or more complex. In the later case using HTTP POST would be the most
> > > efficient means for sending the input.
> > >
> > > So the matrix would look like:
> > >
> > > oper/
> > > method | lookup      | computation
> > > --------------------------------
> > > GET    | always      | sometimes
> > > POST   | non-optimal | optimal
> > >
> > > If the operation is a lookup I would select GET since it's
> > more efficient,
> > > if the operation is a computation I would select POST since
> > it's more
> > > generic. If the operation is a lookup I would not select SMTP,
> > > and if it's a
> > > computation I would not select FTP. So the decision which
> > protocol binding
> > > to use depends on the operation and not vice versa.
> > >
> > >
> > > > > Another issue concerns firewalls. Practically speaking
> > controlling
> > > > > individual access to URLs at the firewall level is
> > > > > impractical. Firewalls
> > > > > only work well when there is a coarse-grain identification of
> > > > > services, e.g.
> > > > > a partial path in the URL, or path withour parameters.
> > > > >
> > > >
> > > > Fair enough.  I should probably say something like "Security
> > > > intermediary".
> > > > I was thinking of authentication servers as well.
> > >
> > > There are two ends to the security spectrum. You can
> > control access to a
> > > service and you can control access to the entity (warning: this is a
> > > generalization).
> > >
> > > You can easily control access to the service at the front-end (the
> > > firewall), you can do so even before the message hits the
> > Web service. For
> > > efficiency you will only look at the service identification
> > not the entire
> > > URL. You can easily control access to the data at the back-end (the
> > > database). For efficiency you will look at the entity
> > identifier ignoring
> > > the service used to access it.
> > >
> > > I don't believe you can easily do one with the other.
> > >
> > > An authentication service could perform authentication at the
> > > front-end, but
> > > that security token will be used at the front-end to limit
> > access to the
> > > service and at the back-end to limit access to the entity. So the
> > > interesting point is how you combine two access methods
> > with the service
> > > identification, resource identification and security token.
> > >
> > > I intentionally avoided using the term resource. As Francis just
> > > pointed out
> > > the word is overload, both the service and the entity are
> > resources, and
> > > each can (and should be) identifier by some URI. What I am
> > > looking for is a
> > > way to separate the service resource from the entity
> > resource, without
> > > precluding a combination access resource to exist.
> > >
> > >
> > > > Aha...  So URLs are opaque to the consumer, that is don't make any
> > > > assumptions.  But URLs can certainly be non-opaque to the
> > URL provider.
> > > > There are many reasons, particularly partioning of the security
> > > > realms, and
> > > > simplicity in application development (like /en and /fr
> > > > subtrees), for doing
> > > > this.
> > >
> > > Definitely.
> > >
> > > However, I think opaque might be understood the wrong way in
> > > preventing the
> > > consumer from constructing opaque URLs.
> > >
> > > In the case of HTTP GET, the provider tells the consumer how to
> > > construct a
> > > request URL that contains the service end-point and the
> > entity identifier.
> > > It essentially contructs and access resource identifier. The
> > > consumer has no
> > > other knowledge about anything inside the URLs. There might
> > be additional
> > > information there interesting to the provider (security
> > domain, language,
> > > etc). But the consumer can understand the relation between a service
> > > resource and an entity resource and that the lifetime of the
> > > access resource
> > > may be shorter than the two other resources.
> > >
> > >
> > > > I'm not quite following, I can't quite picture the URLs.  I'm
> > > working on a
> > > > few different scenarios to show these two styles, maybe that will
> > > > help with
> > > > these - not quite darned ready to post yet..
> > > >
> > > > How would the service identification be different that
> > the resource
> > > > identification?  Taking your first and third scenarios, is the
> > > difference
> > > > between these that the service identifier (say the nefarious
> > > > getStockQuote)
> > > > is part of the URI in the first case, and not part of the URI in
> > > > the second
> > > > case?  The URLs might like something like:
> > > > /stockservice/getStockQuote?symbol=BEAS and
> > /stockquote?symbol=BEAS.  If
> > > > this is right, I'm not sure that you can do a cache of
> > the 3rd result,
> > > > because the service identifier
> > (stockPriceServiceHighPayingCustomer or
> > > > stockPriceServiceFreebieMoocherCustomers) would have to
> > be inside the
> > > > message somewhere, so the cache wouldn't know which
> > representation to
> > > > return.  Or did I put the service identifier in the URI
> > and not the
> > > > resource? :-)
> > >
> > > I would say tns:stockQuote is the service and
> > > /stockservice/getStockQuote is
> > > one of its end-points. So in my Web service stack I define
> > a new service
> > > (tns:stockQuote) and then attach some security policy. The Web
> > > service stack
> > > knows all the end-points and protocol details so it can exert
> > > access control
> > > on the specific services (directly, using a firewall, allowing
> > > caching, etc)
> > > and individually for each protocol (e.g. ACL for HTTP and
> > spam filter for
> > > SMTP).
> > >
> > > The entity identifier is {symbol,BEAS}. Given a service
> > definition with a
> > > set of end-points and protocol bindings I can construct
> > messages and sent
> > > them to the service's access URL. I can have an HTTP POST binding
> > > where the
> > > input goes inside the SOAP message, I can have HTTP GET
> > binding where the
> > > input goes inside the URL (as per your example) and I can have
> > > SMTP binding
> > > where the input goes in the SMTP subject header.
> > >
> > > I am making the separation between service and entity to
> > allow multiple
> > > end-points and protocols to access the same entity. Let's
> > say I have 1000
> > > entities (resources) that I store in my database so I can
> > later operate on
> > > them. The service end-point changes to use HTTP + XML-Sig instead
> > > of HTTPS.
> > > If I encode the actual URL in the database I need a way to
> > change all 1000
> > > URLs in the database. And since we agree these URLs are
> > opaque, I have no
> > > way of doing that.
> > >
> > > On the other hand, if I encode service ID + entity ID, then I
> > > create the URL
> > > before accessing the actual resource over a specific end-point.
> > > There is one
> > > service definition to change and once I change it, all future
> > > access to the
> > > entity will use the new end-point/protocol.
> > >
> > > arkin
> > >
> > > >
> > > > Cheers,
> > > > Dave
> >
> >
Received on Tuesday, 18 February 2003 16:49:39 UTC