- From: Assaf Arkin <arkin@intalio.com>
- Date: Mon, 17 Feb 2003 15:03:13 -0800
- To: "David Orchard" <dorchard@bea.com>, <www-ws-arch@w3.org>
> > I cannot imagine a cache engine that will look at the body of the POST > > message to determine what information to cache, so I would > > tend to agree > > that for caching purposes using GET is a better approach. > > However, if my > > input message only contains an identification of a resource > > (in addition to > > identifying service outside the message) and such > > identification can be > > encoded as a parameter in the URL of the HTTP request, I > > could allow a cache > > engine (and other technologies) to manage access to that resource. > > > > Are you suggesting a scenario where there's a URI and an effectively empty > POST request? The problem is that how can you cache that? You'd have to > somehow mark the POST as being idempotent, so that the cache > would know that > it didn't have to get a response from the resource. That's > exactly what GET > does. I think the key part of the GET semantic that's explicit in 2616 is > that the GET is idempotent, therefore caches can do their mojo. I am actually making the case for using HTTP GET and not HTTP POST for this example, but from the perspective of selecting the best protocol bindings for a particular protocol. In other words, both POST and GET (and also FTP and SMTP) are possible, but given the definition of the operation I select the protocol binding that is most efficient to use and in this case HTTP GET rules. (Since GET already does what we need, I do not see much need to allow caching for POST. Sure, you can use HTTP POST in the protocol binding, it will simply be an inefficient choice.) The selection which protocol to use is rather arbitrary. The service-oriented architecture simply describes a service for retrieving a message. When it comes time to select GET vs POST, the best practice for protocol bindings will direct me to use GET pointing out that POST is possible but less efficient (and we all agree on the why). The way I look at it, a request can identify the service and the particular (output) message I am interested in. The message may exist prior to my request, i.e. the input does not lead to a computation that generates the message. And there are several protocols that can retrieve such message efficienctly. (Warning: partial analogy follows) JMS has an interesting feature called selectors. When you retrieve a particular message, e.g. stock quote for ticker XYZ, you can use a selector as the input to the request, identifying which message you use. In effect you are performing an operation that retrieves an existing message cached by the JMS engine, but in your action you are supplying an input to identify that message from all other messages available from that service. If I were to write a similar model using HTTP GET (with the Web server representing a multi-consumer queue) I would simply switch the SQL syntax of the JMS selector with a URL encoding that would give me the URL by which the message can be retrieved with a single HTTP GET. > "Direct manipulation of Resources" instead of "indirection manipulation". > That is, if you indirectly manipulate a resource, there's too > many variable > places for the "real" resource/service identifier to be in the > message for a > proxy to figure out whether it can cache, etc. the representations. The matrix I use is slightly different. I separate operations into those that involve computation and those that do not. I further separate computation operations to those that are idempotent and those that are not, those that are atomic and those that are not, etc. For now I would use the term 'lookup', but I'm not sure it's such a good name. A lookup is something you can easily cache, a computation is something you cannot easily cache (if you cache it, you cache it in the service not the intermediary). For 'lookup' the input has to be simple so you can encode it in a URL, and if you encode it in a URL and use HTTP GET then you can cache it, proxy it, etc. For computation the input may be simple so you can encode it in a URL, or more complex. In the later case using HTTP POST would be the most efficient means for sending the input. So the matrix would look like: oper/ method | lookup | computation -------------------------------- GET | always | sometimes POST | non-optimal | optimal If the operation is a lookup I would select GET since it's more efficient, if the operation is a computation I would select POST since it's more generic. If the operation is a lookup I would not select SMTP, and if it's a computation I would not select FTP. So the decision which protocol binding to use depends on the operation and not vice versa. > > Another issue concerns firewalls. Practically speaking controlling > > individual access to URLs at the firewall level is > > impractical. Firewalls > > only work well when there is a coarse-grain identification of > > services, e.g. > > a partial path in the URL, or path withour parameters. > > > > Fair enough. I should probably say something like "Security > intermediary". > I was thinking of authentication servers as well. There are two ends to the security spectrum. You can control access to a service and you can control access to the entity (warning: this is a generalization). You can easily control access to the service at the front-end (the firewall), you can do so even before the message hits the Web service. For efficiency you will only look at the service identification not the entire URL. You can easily control access to the data at the back-end (the database). For efficiency you will look at the entity identifier ignoring the service used to access it. I don't believe you can easily do one with the other. An authentication service could perform authentication at the front-end, but that security token will be used at the front-end to limit access to the service and at the back-end to limit access to the entity. So the interesting point is how you combine two access methods with the service identification, resource identification and security token. I intentionally avoided using the term resource. As Francis just pointed out the word is overload, both the service and the entity are resources, and each can (and should be) identifier by some URI. What I am looking for is a way to separate the service resource from the entity resource, without precluding a combination access resource to exist. > Aha... So URLs are opaque to the consumer, that is don't make any > assumptions. But URLs can certainly be non-opaque to the URL provider. > There are many reasons, particularly partioning of the security > realms, and > simplicity in application development (like /en and /fr > subtrees), for doing > this. Definitely. However, I think opaque might be understood the wrong way in preventing the consumer from constructing opaque URLs. In the case of HTTP GET, the provider tells the consumer how to construct a request URL that contains the service end-point and the entity identifier. It essentially contructs and access resource identifier. The consumer has no other knowledge about anything inside the URLs. There might be additional information there interesting to the provider (security domain, language, etc). But the consumer can understand the relation between a service resource and an entity resource and that the lifetime of the access resource may be shorter than the two other resources. > I'm not quite following, I can't quite picture the URLs. I'm working on a > few different scenarios to show these two styles, maybe that will > help with > these - not quite darned ready to post yet.. > > How would the service identification be different that the resource > identification? Taking your first and third scenarios, is the difference > between these that the service identifier (say the nefarious > getStockQuote) > is part of the URI in the first case, and not part of the URI in > the second > case? The URLs might like something like: > /stockservice/getStockQuote?symbol=BEAS and /stockquote?symbol=BEAS. If > this is right, I'm not sure that you can do a cache of the 3rd result, > because the service identifier (stockPriceServiceHighPayingCustomer or > stockPriceServiceFreebieMoocherCustomers) would have to be inside the > message somewhere, so the cache wouldn't know which representation to > return. Or did I put the service identifier in the URI and not the > resource? :-) I would say tns:stockQuote is the service and /stockservice/getStockQuote is one of its end-points. So in my Web service stack I define a new service (tns:stockQuote) and then attach some security policy. The Web service stack knows all the end-points and protocol details so it can exert access control on the specific services (directly, using a firewall, allowing caching, etc) and individually for each protocol (e.g. ACL for HTTP and spam filter for SMTP). The entity identifier is {symbol,BEAS}. Given a service definition with a set of end-points and protocol bindings I can construct messages and sent them to the service's access URL. I can have an HTTP POST binding where the input goes inside the SOAP message, I can have HTTP GET binding where the input goes inside the URL (as per your example) and I can have SMTP binding where the input goes in the SMTP subject header. I am making the separation between service and entity to allow multiple end-points and protocols to access the same entity. Let's say I have 1000 entities (resources) that I store in my database so I can later operate on them. The service end-point changes to use HTTP + XML-Sig instead of HTTPS. If I encode the actual URL in the database I need a way to change all 1000 URLs in the database. And since we agree these URLs are opaque, I have no way of doing that. On the other hand, if I encode service ID + entity ID, then I create the URL before accessing the actual resource over a specific end-point. There is one service definition to change and once I change it, all future access to the entity will use the new end-point/protocol. arkin > > Cheers, > Dave
Received on Monday, 17 February 2003 18:05:26 UTC