Re: update= vs query= from Chimezie Ogbuji on 2009-09-23 (public-rdf-dawg@w3.org from July to September 2009)

From: Chimezie Ogbuji <ogbujic@ccf.org>
Date: Wed, 23 Sep 2009 13:59:13 -0400
To: "Lee Feigenbaum" <lee@thefigtrees.net>
cc: "SPARQL Working Group" <public-rdf-dawg@w3.org>
Message-ID: <C6DFDA31.C9B6%ogbujic@ccf.org>
On 9/23/09 9:26 AM, "Lee Feigenbaum" <lee@thefigtrees.net> wrote:
>>> in either case, it was suggested that a natural way to extend the
>>> protocol would be to have update statements over HTTP be indicated with
>>> update= rather than query=. Chimezie wondered on the telecon and below
>>> whether that was necessary.
>> 
>> I was concerned about if it is necessary but my motivation was also about
>> avoiding the practice of embedding an operation as a query string /
>> parameter to a URI request.  It bypasses the use the 'natural' operations
>> (GET/POST/etc..) to determine what action is taken and it also violates the
>> opaqueness of URI principle since a server has to parse the request to
>> determine what operation to take.
> 
> I'm not clear on how doing everything with query= is better in this
> regard. Or are you proposing that we'd require a separate URI to handle
> SPARQL/Update vs. SPARQL/Query strings?
> 
> Maybe an example would help?

I was just emphasizing that it is typically bad practice for a service
request handler to have to parse the URI request to determine the nature of
the operation.  

Okay, so assume we want to host a SPARQL service at sparqlservice.com . We
currently define have an abstract query operation (on a SparqlQuery
interface) that is bound to HTTP via the use of GET and it has two
arguments: the query string or the dataset description.  The latter is
optional.  

The service administrator can assign an arbitrary addressable resource to
handle requests to this HTTP binding.  So he/she chooses to use:

- http://sparqlservice.com/service1

The query string from the client is just an argument to the operation.  The
part before the query string (the request URI) and the HTTP method (alone)
is what determines the binding to the SparqlQuery interface.   So, if you
have a URL that encodes the GET request to an HTTP binding of a SPARQL
query, the invocation (at the HTTP level) is the same with or without the
query string.  It would be analogous to calling the same method twice with
different arguments:

http://sparqlservice.com/service1?query=...
http://sparqlservice.com/service1

Will result in the same HTTP request:

GET /service1 HTTP/1.1
Host: sparqlservice.com

[[Someone please correct me if I've misunderstood the way URIs with query
strings are translated into Request-URIs for the corresponding HTTP
requests]]

So, I'm saying 2 things: 1) it is generally a bad practice for the server to
have to 'peek' into the original URI to determine how to route the operation
(if it is properly bound to the protocol), and 2) The service administrator
has the option of binding to 2 *separate* handlers for HTTP bindings to both
the SPARQL/Update and SPARQL/Query interfaces.

If we simply define an additional abstract interface for SPARQL updates and
(for example) bind to HTTP via the POST method (since we certainly don't
want to bind an unsafe operation to GET), then the service administrator
would have the choice of allocate a separate resource to handle invocations
to each binding. The advantage is that they would then have 'protocol-level'
separation between the bound operations and wouldn't have to implement
anything for that purpose.

> Hmm, I'm not sure about that. As I understand it (not too well,
> admittedly), a WSDL interface is composed (potentially) of many WSDL
> operations. The interface then gets bound to HTTP via the WSDL binding
> mechanism. Within that binding, each operation gets bound to a
> particular HTTP method, with particular faults, content encodings,
> parameter separators, etc. (the syntax summary in 6.2 at
> http://www.w3.org/TR/wsdl20-adjuncts/#http-binding is helpful here).

Okay.
 
> Then someone comes along and deploys an endpoint, and describes the
> endpoint in WSDL and gives a URI for the interface (i.e., for all of the
> operations). But the thing I was surprised about is, message
> serialization is only based on the contents of an operation's input
> message, and there's no place at which the operation's name comes into
> play. 

Did my example above help? Admittedly, my understanding is purely from the
semantics of HTTP (I'm not familiar with the WSDL 2 adjuncts).  So, the
operation's *name* is completely specified by the endpoint URI and method in
the HTTP request.  

GET ..sparql endpoint URI.. HTTP/1.1

> So if my endpoint is /service and I have 3 operations, they all
> get invoked as some variant of /service?param1=...&param2=... and the
> only thing that my server has to distinguish between the operations is
> the parameters to each.

So, I would argue that this is bad configuration for an HTTP endpoint, since
(from an HTTP perspective), you have one resource handling multiplexed
operations and thus the handler is then responsible for distinguishing
between operations (via peeking into the parameters) rather than relying on
the mechanism that the protocol being bound to (HTTP in this case) provides
for distinguishing operations (via the combination of Request-URI and HTTP
method).  Each operation really should be an separately addressable
resource.

> Anyway, I'm not sure this really matters.

I get the impression that this is more of a question of best practice for
the service administrator to consider, since they have options in how they
bind resources to query and update operations.

>> But it does so by parsing at least part of the URI, which I don't think is a
>> good practice.  If only the SPARQL/Query interface was bound to to an HTTP
>> 'service', then the server would simply not know how to handle a
>> SPARQL/Update request
> 
> In the case where it's all query=, it needs to parse a level deeper,
> right? Unless you're suggesting that we require that SPARQL/Update and
> SPARQL/Query never be both available at the same endpoint URI?

I'm not suggesting that we enforce that constraint, but it seems (to me) to
be a good practice in general.

-- Chimezie


===================================

P Please consider the environment before printing this e-mail

Cleveland Clinic is ranked one of the top hospitals
in America by U.S. News & World Report (2008).  
Visit us online at http://www.clevelandclinic.org for
a complete listing of our services, staff and
locations.


Confidentiality Note:  This message is intended for use
only by the individual or entity to which it is addressed
and may contain information that is privileged,
confidential, and exempt from disclosure under applicable
law.  If the reader of this message is not the intended
recipient or the employee or agent responsible for
delivering the message to the intended recipient, you are
hereby notified that any dissemination, distribution or
copying of this communication is strictly prohibited.  If
you have received this communication in error,  please
contact the sender immediately and destroy the material in
its entirety, whether electronic or hard copy.  Thank you.
Received on Wednesday, 23 September 2009 18:00:39 UTC