Re: update= vs query= from Lee Feigenbaum on 2009-09-23 (public-rdf-dawg@w3.org from July to September 2009)

From: Lee Feigenbaum <lee@thefigtrees.net>
Date: Wed, 23 Sep 2009 14:09:03 -0400
To: Chimezie Ogbuji <ogbujic@ccf.org>, SPARQL Working Group <public-rdf-dawg@w3.org>
Message-ID: <4ABA643F.5040109@thefigtrees.net>
Chimezie, I think I understand what you're saying, but wanted to ask you 
whether defining update in the /Protocol document as its own interface 
(so it can be found to its own URI endpoint) but _also_ defining the 
update operation to have an update= (rather then query=) parameter would 
satisfy your HTTP sensibilities.

This would (I think) mean that admins who wanted to could deploy on 
separate URIs and maintain the "URI drives operation" setup that your'e 
advocating, while other admins could have a single endpoint which 
dispatches to a SPARQL/Query or SPARQL/Update processor based on whether 
query= or update= is sent up.

What do you think?

Lee

PS I'm pretty sure the query string gets sent in the GET request as part 
of the URI being requested.

GET /service?foo=bar HTTP/1.1
Host: ...

The only thing that doesn't get sent is the fragment identifier.

Chimezie Ogbuji wrote:
> On 9/23/09 9:26 AM, "Lee Feigenbaum" <lee@thefigtrees.net> wrote:
>>>> in either case, it was suggested that a natural way to extend the
>>>> protocol would be to have update statements over HTTP be indicated with
>>>> update= rather than query=. Chimezie wondered on the telecon and below
>>>> whether that was necessary.
>>> I was concerned about if it is necessary but my motivation was also about
>>> avoiding the practice of embedding an operation as a query string /
>>> parameter to a URI request.  It bypasses the use the 'natural' operations
>>> (GET/POST/etc..) to determine what action is taken and it also violates the
>>> opaqueness of URI principle since a server has to parse the request to
>>> determine what operation to take.
>> I'm not clear on how doing everything with query= is better in this
>> regard. Or are you proposing that we'd require a separate URI to handle
>> SPARQL/Update vs. SPARQL/Query strings?
>>
>> Maybe an example would help?
> 
> I was just emphasizing that it is typically bad practice for a service
> request handler to have to parse the URI request to determine the nature of
> the operation.  
> 
> Okay, so assume we want to host a SPARQL service at sparqlservice.com . We
> currently define have an abstract query operation (on a SparqlQuery
> interface) that is bound to HTTP via the use of GET and it has two
> arguments: the query string or the dataset description.  The latter is
> optional.  
> 
> The service administrator can assign an arbitrary addressable resource to
> handle requests to this HTTP binding.  So he/she chooses to use:
> 
> - http://sparqlservice.com/service1
> 
> The query string from the client is just an argument to the operation.  The
> part before the query string (the request URI) and the HTTP method (alone)
> is what determines the binding to the SparqlQuery interface.   So, if you
> have a URL that encodes the GET request to an HTTP binding of a SPARQL
> query, the invocation (at the HTTP level) is the same with or without the
> query string.  It would be analogous to calling the same method twice with
> different arguments:
> 
> http://sparqlservice.com/service1?query=...
> http://sparqlservice.com/service1
> 
> Will result in the same HTTP request:
> 
> GET /service1 HTTP/1.1
> Host: sparqlservice.com
> 
> [[Someone please correct me if I've misunderstood the way URIs with query
> strings are translated into Request-URIs for the corresponding HTTP
> requests]]
> 
> So, I'm saying 2 things: 1) it is generally a bad practice for the server to
> have to 'peek' into the original URI to determine how to route the operation
> (if it is properly bound to the protocol), and 2) The service administrator
> has the option of binding to 2 *separate* handlers for HTTP bindings to both
> the SPARQL/Update and SPARQL/Query interfaces.
> 
> If we simply define an additional abstract interface for SPARQL updates and
> (for example) bind to HTTP via the POST method (since we certainly don't
> want to bind an unsafe operation to GET), then the service administrator
> would have the choice of allocate a separate resource to handle invocations
> to each binding. The advantage is that they would then have 'protocol-level'
> separation between the bound operations and wouldn't have to implement
> anything for that purpose.
> 
>> Hmm, I'm not sure about that. As I understand it (not too well,
>> admittedly), a WSDL interface is composed (potentially) of many WSDL
>> operations. The interface then gets bound to HTTP via the WSDL binding
>> mechanism. Within that binding, each operation gets bound to a
>> particular HTTP method, with particular faults, content encodings,
>> parameter separators, etc. (the syntax summary in 6.2 at
>> http://www.w3.org/TR/wsdl20-adjuncts/#http-binding is helpful here).
> 
> Okay.
>  
>> Then someone comes along and deploys an endpoint, and describes the
>> endpoint in WSDL and gives a URI for the interface (i.e., for all of the
>> operations). But the thing I was surprised about is, message
>> serialization is only based on the contents of an operation's input
>> message, and there's no place at which the operation's name comes into
>> play. 
> 
> Did my example above help? Admittedly, my understanding is purely from the
> semantics of HTTP (I'm not familiar with the WSDL 2 adjuncts).  So, the
> operation's *name* is completely specified by the endpoint URI and method in
> the HTTP request.  
> 
> GET ..sparql endpoint URI.. HTTP/1.1
> 
>> So if my endpoint is /service and I have 3 operations, they all
>> get invoked as some variant of /service?param1=...&param2=... and the
>> only thing that my server has to distinguish between the operations is
>> the parameters to each.
> 
> So, I would argue that this is bad configuration for an HTTP endpoint, since
> (from an HTTP perspective), you have one resource handling multiplexed
> operations and thus the handler is then responsible for distinguishing
> between operations (via peeking into the parameters) rather than relying on
> the mechanism that the protocol being bound to (HTTP in this case) provides
> for distinguishing operations (via the combination of Request-URI and HTTP
> method).  Each operation really should be an separately addressable
> resource.
> 
>> Anyway, I'm not sure this really matters.
> 
> I get the impression that this is more of a question of best practice for
> the service administrator to consider, since they have options in how they
> bind resources to query and update operations.
> 
>>> But it does so by parsing at least part of the URI, which I don't think is a
>>> good practice.  If only the SPARQL/Query interface was bound to to an HTTP
>>> 'service', then the server would simply not know how to handle a
>>> SPARQL/Update request
>> In the case where it's all query=, it needs to parse a level deeper,
>> right? Unless you're suggesting that we require that SPARQL/Update and
>> SPARQL/Query never be both available at the same endpoint URI?
> 
> I'm not suggesting that we enforce that constraint, but it seems (to me) to
> be a good practice in general.
> 
> -- Chimezie
> 
> 
> ===================================
> 
> P Please consider the environment before printing this e-mail
> 
> Cleveland Clinic is ranked one of the top hospitals
> in America by U.S. News & World Report (2008).  
> Visit us online at http://www.clevelandclinic.org for
> a complete listing of our services, staff and
> locations.
> 
> 
> Confidentiality Note:  This message is intended for use
> only by the individual or entity to which it is addressed
> and may contain information that is privileged,
> confidential, and exempt from disclosure under applicable
> law.  If the reader of this message is not the intended
> recipient or the employee or agent responsible for
> delivering the message to the intended recipient, you are
> hereby notified that any dissemination, distribution or
> copying of this communication is strictly prohibited.  If
> you have received this communication in error,  please
> contact the sender immediately and destroy the material in
> its entirety, whether electronic or hard copy.  Thank you.
> 
> 
>
Received on Wednesday, 23 September 2009 18:09:55 UTC