- From: David Orchard <dorchard@bea.com>
- Date: Mon, 17 Feb 2003 21:18:37 -0800
- To: "'Assaf Arkin'" <arkin@intalio.com>, <www-ws-arch@w3.org>
Assaf, I won't have a chance to respond in depth to this for a number of days - my last email response to you took me almost 2 hours to write and I'm way booked up this week. As I remember from earlier WSCI days, you can pump out the text :-) This looks like very dense and interesting stuff. Perhaps being a bit too blunt, what's the point? Are we disagreeing? Agreeing? Are you proposing your 2nd paragraph etc. as the text for our ws-arch document, instead of what I proposed? If so, what are the properties that it exhibits, and when is it better or worse than other architectures? I think you catch my drift. It looks like very interesting stuff - I actually quite like the notion of shared contexts, though I'm not sure that is central to SOA vs anything else - but I'm just not sure what to do with it wrt our working group's deliverables and what I've proposed. Cheers, Dave > -----Original Message----- > From: www-ws-arch-request@w3.org [mailto:www-ws-arch-request@w3.org]On > Behalf Of Assaf Arkin > Sent: Monday, February 17, 2003 8:58 PM > To: Assaf Arkin; David Orchard; www-ws-arch@w3.org > Subject: RE: Introducing the Service Oriented Architectural style, and > it's constraints and properties. > > > > I'm going to comment on my comments. All I can say is, a > whiteborad would > come in useful right now ;-) > > In generic terms I would define a service-oriented > architecture as one in > which services perform operations within an identified > context given some > input data. In its output a service can provide the same or a derived > context. You can see that in CORBA, COM, SOAP + WS-Security, ebXML > messaging, etc. > > The context could be something like a security token, a transaction, a > session, etc. Contexts can overlap at different points in > time, e.g. two > operations in the same transactions, two transactions in the > same session, a > transaction that involves two operations in different > security contexts, > etc. > > In abstract terms you would define a service, an operation of > that service, > a context (Sanjiva's proposal, correlations) and some input > (message body). > You would then define multiple protocol bindings that encode that > information in various ways over the wire. > > At the lowest level we can look at IP as being a protocol > with ip:port being > the channel and everything else passed in the message. With > HTTP we have > more capable channels encoded as URLs which are carried in > the message. It > becomes more interesting to look at the URL as the channel > rather than the > ip:port portion of it. With SOAP we can further extend the channel to > include additional context information (transaction, security, etc). > > But with WSDL we shouldn't make that distinction. We should > simply define > the channel as composition of service + operation + context > and input/output > for the operation. We can then encode it as TCP, as HTTP over > TCP, as SOAP > over HTTP, or have multiple encodings at the same time. > > Deciding whether to use HTTP GET or POST is merely a matter > of optimization. > If all the information can be captured in a URL and the > output can be cached > it makes more sense to use GET than to "fix" caches/proxies > to support POST. > If you use HTTP you simply lose caching capability (or you > need to fix a lot > of cachese), you are less efficient but still functional. On > the other hand, > if the information cannot fit in a URL you can have some > GET-based protocol > to get around it(*), or you can use POST more efficiently. > > A simple system can be constructed using a low-level model. > In that model > channels denoted by service + operation are known a priori, > but channels > denoted by service + operation + context must be > communicated. Such a simple > system would look like REST if you opted to use HTTP GET in > the protocol > bindings, or pi-calculus if you tried to describe it formally. > > A more complex system will have to make a decision on where to place > complexity. It could use the low-level model resulting in > complexity of the > process, or a high-level model resulting in simplification of > the system but > requiring a more complex model. You can reduce processing and message > passing by electing to use the high-level model. > > In a high-level model you would allow the sender to construct > the channel > given a knowledge of the context, in effect communicating > partial channels > (service and context separately) and combining them to form > new channels. > After playing with low-level models for a while and realizing > the inherit > complexity, this model was selected for use in WSCI and BPML, > and if my > understanding is correct also used in BPEL, WS security and > other related > specifications. > > The model doesn't preclude anyone from building a RESTful > system. On the > contrary, if the system is simple to model it will naturally > follow the REST > approach. If the system is not so simple then two options exist: use > RESTless interactions (transactions, cookies, POSTs) or > rewrite the system > based on simpler forms of interactions but at the cost of increasing > complexity and message passing. > > With some limitations we can try this out with WSDL 1.1. > > 1. Define a lookup operation with WSDL that returns access URLs (e.g. > getStockQuote). Provide two set of protocol bindings, one > using HTTP GET and > one using HTTP POST, and see which one is easier to access/cache. > > 2. Define a complex scenario using high-level non-idempotent > context-carrying operations (e.g. purchase order management) using the > minimal set of operations. Start by using HTTP POST so there are no > restrictions. > > 3. Reduce no#2 into a constrained model as used for no#1 and > see how many > new states/messages are required to get it working. > > arkin > > > * While such a protocol is doable and even provable using > process calculus > it increases the number of messages exchanged, the complexity > of the system > and further requires both parties to be full-fledged Web > services. There are > many cases where having simplicity and assimetry (e.g. > browser and server) > works best. > > > > -----Original Message----- > > From: www-ws-arch-request@w3.org > [mailto:www-ws-arch-request@w3.org]On > > Behalf Of Assaf Arkin > > Sent: Monday, February 17, 2003 3:03 PM > > To: David Orchard; www-ws-arch@w3.org > > Subject: RE: Introducing the Service Oriented Architectural > style, and > > it's constraints and properties. > > > > > > > > > > I cannot imagine a cache engine that will look at the > body of the POST > > > > message to determine what information to cache, so I would > > > > tend to agree > > > > that for caching purposes using GET is a better approach. > > > > However, if my > > > > input message only contains an identification of a resource > > > > (in addition to > > > > identifying service outside the message) and such > > > > identification can be > > > > encoded as a parameter in the URL of the HTTP request, I > > > > could allow a cache > > > > engine (and other technologies) to manage access to > that resource. > > > > > > > > > > Are you suggesting a scenario where there's a URI and an > > effectively empty > > > POST request? The problem is that how can you cache that? > > You'd have to > > > somehow mark the POST as being idempotent, so that the cache > > > would know that > > > it didn't have to get a response from the resource. That's > > > exactly what GET > > > does. I think the key part of the GET semantic that's explicit > > in 2616 is > > > that the GET is idempotent, therefore caches can do their mojo. > > > > I am actually making the case for using HTTP GET and not HTTP > > POST for this > > example, but from the perspective of selecting the best > protocol bindings > > for a particular protocol. In other words, both POST and > GET (and also FTP > > and SMTP) are possible, but given the definition of the > operation I select > > the protocol binding that is most efficient to use and in this > > case HTTP GET > > rules. > > > > (Since GET already does what we need, I do not see much > need to allow > > caching for POST. Sure, you can use HTTP POST in the > protocol binding, it > > will simply be an inefficient choice.) > > > > The selection which protocol to use is rather arbitrary. The > > service-oriented architecture simply describes a service > for retrieving a > > message. When it comes time to select GET vs POST, the best > practice for > > protocol bindings will direct me to use GET pointing out > that POST is > > possible but less efficient (and we all agree on the why). > > > > The way I look at it, a request can identify the service and the > > particular > > (output) message I am interested in. The message may exist > prior to my > > request, i.e. the input does not lead to a computation that > generates the > > message. And there are several protocols that can retrieve > such message > > efficienctly. > > > > (Warning: partial analogy follows) JMS has an interesting > feature called > > selectors. When you retrieve a particular message, e.g. > stock quote for > > ticker XYZ, you can use a selector as the input to the request, > > identifying > > which message you use. In effect you are performing an > operation that > > retrieves an existing message cached by the JMS engine, but > in your action > > you are supplying an input to identify that message from all > > other messages > > available from that service. > > > > If I were to write a similar model using HTTP GET (with the > Web server > > representing a multi-consumer queue) I would simply switch the > > SQL syntax of > > the JMS selector with a URL encoding that would give me the URL > > by which the > > message can be retrieved with a single HTTP GET. > > > > > > > "Direct manipulation of Resources" instead of "indirection > > manipulation". > > > That is, if you indirectly manipulate a resource, there's too > > > many variable > > > places for the "real" resource/service identifier to be in the > > > message for a > > > proxy to figure out whether it can cache, etc. the > representations. > > > > The matrix I use is slightly different. I separate > operations into those > > that involve computation and those that do not. I further separate > > computation operations to those that are idempotent and those > > that are not, > > those that are atomic and those that are not, etc. > > > > For now I would use the term 'lookup', but I'm not sure > it's such a good > > name. A lookup is something you can easily cache, a computation > > is something > > you cannot easily cache (if you cache it, you cache it in > the service not > > the intermediary). > > > > For 'lookup' the input has to be simple so you can encode > it in a URL, and > > if you encode it in a URL and use HTTP GET then you can cache it, > > proxy it, > > etc. For computation the input may be simple so you can encode it > > in a URL, > > or more complex. In the later case using HTTP POST would be the most > > efficient means for sending the input. > > > > So the matrix would look like: > > > > oper/ > > method | lookup | computation > > -------------------------------- > > GET | always | sometimes > > POST | non-optimal | optimal > > > > If the operation is a lookup I would select GET since it's > more efficient, > > if the operation is a computation I would select POST since > it's more > > generic. If the operation is a lookup I would not select SMTP, > > and if it's a > > computation I would not select FTP. So the decision which > protocol binding > > to use depends on the operation and not vice versa. > > > > > > > > Another issue concerns firewalls. Practically speaking > controlling > > > > individual access to URLs at the firewall level is > > > > impractical. Firewalls > > > > only work well when there is a coarse-grain identification of > > > > services, e.g. > > > > a partial path in the URL, or path withour parameters. > > > > > > > > > > Fair enough. I should probably say something like "Security > > > intermediary". > > > I was thinking of authentication servers as well. > > > > There are two ends to the security spectrum. You can > control access to a > > service and you can control access to the entity (warning: this is a > > generalization). > > > > You can easily control access to the service at the front-end (the > > firewall), you can do so even before the message hits the > Web service. For > > efficiency you will only look at the service identification > not the entire > > URL. You can easily control access to the data at the back-end (the > > database). For efficiency you will look at the entity > identifier ignoring > > the service used to access it. > > > > I don't believe you can easily do one with the other. > > > > An authentication service could perform authentication at the > > front-end, but > > that security token will be used at the front-end to limit > access to the > > service and at the back-end to limit access to the entity. So the > > interesting point is how you combine two access methods > with the service > > identification, resource identification and security token. > > > > I intentionally avoided using the term resource. As Francis just > > pointed out > > the word is overload, both the service and the entity are > resources, and > > each can (and should be) identifier by some URI. What I am > > looking for is a > > way to separate the service resource from the entity > resource, without > > precluding a combination access resource to exist. > > > > > > > Aha... So URLs are opaque to the consumer, that is don't make any > > > assumptions. But URLs can certainly be non-opaque to the > URL provider. > > > There are many reasons, particularly partioning of the security > > > realms, and > > > simplicity in application development (like /en and /fr > > > subtrees), for doing > > > this. > > > > Definitely. > > > > However, I think opaque might be understood the wrong way in > > preventing the > > consumer from constructing opaque URLs. > > > > In the case of HTTP GET, the provider tells the consumer how to > > construct a > > request URL that contains the service end-point and the > entity identifier. > > It essentially contructs and access resource identifier. The > > consumer has no > > other knowledge about anything inside the URLs. There might > be additional > > information there interesting to the provider (security > domain, language, > > etc). But the consumer can understand the relation between a service > > resource and an entity resource and that the lifetime of the > > access resource > > may be shorter than the two other resources. > > > > > > > I'm not quite following, I can't quite picture the URLs. I'm > > working on a > > > few different scenarios to show these two styles, maybe that will > > > help with > > > these - not quite darned ready to post yet.. > > > > > > How would the service identification be different that > the resource > > > identification? Taking your first and third scenarios, is the > > difference > > > between these that the service identifier (say the nefarious > > > getStockQuote) > > > is part of the URI in the first case, and not part of the URI in > > > the second > > > case? The URLs might like something like: > > > /stockservice/getStockQuote?symbol=BEAS and > /stockquote?symbol=BEAS. If > > > this is right, I'm not sure that you can do a cache of > the 3rd result, > > > because the service identifier > (stockPriceServiceHighPayingCustomer or > > > stockPriceServiceFreebieMoocherCustomers) would have to > be inside the > > > message somewhere, so the cache wouldn't know which > representation to > > > return. Or did I put the service identifier in the URI > and not the > > > resource? :-) > > > > I would say tns:stockQuote is the service and > > /stockservice/getStockQuote is > > one of its end-points. So in my Web service stack I define > a new service > > (tns:stockQuote) and then attach some security policy. The Web > > service stack > > knows all the end-points and protocol details so it can exert > > access control > > on the specific services (directly, using a firewall, allowing > > caching, etc) > > and individually for each protocol (e.g. ACL for HTTP and > spam filter for > > SMTP). > > > > The entity identifier is {symbol,BEAS}. Given a service > definition with a > > set of end-points and protocol bindings I can construct > messages and sent > > them to the service's access URL. I can have an HTTP POST binding > > where the > > input goes inside the SOAP message, I can have HTTP GET > binding where the > > input goes inside the URL (as per your example) and I can have > > SMTP binding > > where the input goes in the SMTP subject header. > > > > I am making the separation between service and entity to > allow multiple > > end-points and protocols to access the same entity. Let's > say I have 1000 > > entities (resources) that I store in my database so I can > later operate on > > them. The service end-point changes to use HTTP + XML-Sig instead > > of HTTPS. > > If I encode the actual URL in the database I need a way to > change all 1000 > > URLs in the database. And since we agree these URLs are > opaque, I have no > > way of doing that. > > > > On the other hand, if I encode service ID + entity ID, then I > > create the URL > > before accessing the actual resource over a specific end-point. > > There is one > > service definition to change and once I change it, all future > > access to the > > entity will use the new end-point/protocol. > > > > arkin > > > > > > > > Cheers, > > > Dave > >
Received on Tuesday, 18 February 2003 00:21:15 UTC