[ISSUE-32] Implications of updates on protocol, regarding HTTP methods

This email discharges my action
http://www.w3.org/2009/sparql/track/actions/55


The initial SPARQL language and protocol (SPARQL/Query 1.0,
SPARQL/Protocol 1.0) both describe read-only operations, which left no
change of state on the server. SPARQL/Update is expected to use the
SPARQL/Protocol as well, however it is designed to modify state on the
server, which in turn has implications for the protocol.

SPARQL/Protocol 1.0 interface describes describes bindings for SOAP[1]
and HTTP[2]. SOAP has no requirements on server state in response to
an operation, but HTTP does. Given that HTTP is such a commonly
implemented and used binding, this description will focus on
SPARQL/Protocol bound to HTTP.

SPARQL/Protocol 1.0 defines the use of the GET and POST methods,
referred to as queryHttpGet and queryHttpPost respectively. No other
HTTP operations are described. queryHttpGet should be used in all
cases, except where the query exceeds practical limits, in which case
queryHttpPost is used, with the query provided in the body of the
request. In this way, queryHttpPost is being used as a fallback
operation for queryHttpGet, duplicating its functionality.

RFC 2616 describes the HTTP GET method as "Safe", shown here from section 9.1.1:
  "In particular, the convention has been established that the GET and
   HEAD methods SHOULD NOT have the significance of taking an action
   other than retrieval. These methods ought to be considered "safe".
   This allows user agents to represent other methods, such as POST, PUT
   and DELETE, in a special way, so that the user is made aware of the
   fact that a possibly unsafe action is being requested.

   Naturally, it is not possible to ensure that the server does not
   generate side-effects as a result of performing a GET request; in
   fact, some dynamic resources consider that a feature. The important
   distinction here is that the user did not request the side-effects,
   so therefore cannot be held accountable for them."


SPARQL/Update operations are specifically designed to modify data on
the server, specifially:
* Create a graph
* Delete a graph
* Clear statements from a graph
* Create statements
* Delete statements

The HTTP GET method is inappropriate for use with these operations as
they are all modifying operations, and are therefore "Unsafe". Some
other method is required to execute these operations via HTTP.

It should be noted that the POST method has no restrictions on its
"Safety", so modifying operations are permitted with this method.

Also of note is that many implementations extend SPARQL/Protocol 1.0
to provide read-write services. Some implement a REST interface to
provide the above actions through HTTP methods such as PUT and DELETE
(for instance, Sesame and Mulgara). Others accept commands in an
"Update language" such as HP's SPARQL/Update on the PUT and POST
methods (again, Sesame and Mulgara, among others). Note that the use
of POST in this context is not the same as described in
SPARQL/Protocol 1.0, as this protocol describes queryHttpPost while an
update operation is not a "query".


Options for modifying the existing protocol for SPARQL/Update:

Option 1: Write an unrelated new protocol for the HTTP binding
SPARQL/Update to operate on.
This would appear to be duplicating some work, and still needs to
address how modifying operations need to be called.

Option 2: All modifying operations go through POST.
While not mandated, the standard use of POST is to provide all data in
the body. This is how the query operation works. This may inconvenient
for applications that may want to execute a simple operation that can
be encapsulated in the URI.

Option 3: All modifying operations go through PUT with a fallback to
POST for large commands.
This is similar to the definition of query which uses GET and POST.
However, this is awkward if doing a PUT or a POST for a command that
is trying to delete resources, such as triples or graphs, as the
expected semantics of these methods is to add data to a server. Also,
PUT is more tightly defined than POST, expecting a resource in the
URI, while a different resource MUST be referred to with a different
URI.

Option 4: Use appropriate methods for each action.
This means using PUT to create resources, DELETE to remove them, GET
and HEAD to query them. However, this is what the REST protocol will
be doing, and makes the notion of a SPARQL/Update language
superfluous.


Option 2 appears to offer the least difficulty. Are other options available?

Regards,
Paul Gearon

[1] http://www.w3.org/TR/rdf-sparql-protocol/#query-bindings-soap
[2] http://www.w3.org/TR/rdf-sparql-protocol/#query-bindings-http
[3] http://www.w3.org/Protocols/rfc2616/rfc2616-sec9.html#sec9.1.1

Received on Wednesday, 29 July 2009 18:05:50 UTC