Open issues in HTTP/Update protocol (for discussion) from Chimezie Ogbuji on 2010-11-30 (public-rdf-dawg@w3.org from October to December 2010)

From: Chimezie Ogbuji <ogbujic@ccf.org>
Date: Tue, 30 Nov 2010 13:23:54 -0500
To: "SPARQL Working Group" <public-rdf-dawg@w3.org>
Message-ID: <C91AAB6A.14C38%ogbujic@ccf.org>
Subject: Open HTTP Update issues for discussion

I've created a wiki documenting the (open) issues and comments regarding the
HTTP/Update specification:


http://www.w3.org/2009/sparql/wiki/HTTP-UPDATE-ISSUES

Amongst those, there are 3 that require some further discussion in the WG

* ISSUE-56 (PATCH HTTP/Update and SPARQL Update payload) [1]

There is a comment explicitly asking that the specification is clearer on
the behavior of PATCH or at least if we intend for it to be normative.  My
recommendation is to leave the behavior for PATCH as informative (given the
relative youth of the PATCH verb), however we should clarify the behavior.
In particular, SPARQL Update should be RECOMMENDED for use as a patch
document.  A status code of 400 (Bad Request) should be RECOMMENDED as a
response to requests where the SPARQL Update request addresses a graph other
than the one targeted by the PATCH request,  a 404 should be returned if the
graph addressed in the Update request is *not* the same graph identified in
the PATCH request, etc.  So, the informative behavior would be to facilitate
the use of a subset of the SPARQL Update request (the subset that only
targets individual graphs) as a 'patch document' (with an appropriate media
type) to the extent that it matches the semantics of the HTTP request.

* Confusion regarding recommended behavior of OPTION method [2]

This was triggered by the following observation from Andy:

[[
We seem to have update and/or query service at the same place as the graph
store. This would be OK, as services are split by HTTP query string, but not
if the graph store is GETtable and there is a service description to return
[...] I'd suggest it's the graph store (RDF dataset) that is on the web. The
service model is more appropriate for SPARQL Update language and query.
]]

The language regarding OPTIONs was to provide a way (that has a place in the
HTTP semantics) for a user to request a service request document as a means
to address any (http-range-14 related) ambiguity there may be about the
resources behind a SPARQL service.  I guess the question here is: what does
the OPTIONS request target?

With the exception of this section (and accounting for support of the use of
?default to address the default graph), the document specifies only 3 kinds
of things that can be identified by requests in the protocol in the
following cases:

1. Requests to named graphs (directly)
2. Requests to default and named graphs (indirectly)
3. POST requests to the graph store

The first case is straight forward: the request URI identifies the RDF
knowledge that corresponds with the named graph.  The second is not
(currently) clear.  One of the reasons why Figure 2 has an empty oval is
that it is not clear what the full request URI is identifying.  So, for the
example at the end of section 4:

PUT /rdf-graphs/service/?graph=1  HTTP/1.1

what does http://www.example.com/rdf-graphs/service identify? The name
suggests that it is identifying the HTTP Update service itself, but this is
not explicitly stated.  It probably should be, since clarification on what
is being identified is a key component of a 'RESTful' protocol/interface.

The third case is only relevant for the second HTTP POST scenario, which is
meant to mirror the "9.2.  Creating Resources with POST" [3] action in Atom
Pub.  The outstanding issue with that section is how the user comes to learn
the URI of the graph store a priori.  The editor's note suggests that a
requested service description would include this information (which implies
that the graph store and the service are different resources).

My recommendation is to add language making clear that there is a 4th kind
of resource addressable within this protocol (the service), it is known a
priori, it is the target of requests that indirectly route actions to named
(and default) graphs, it is the target of both OPTIONS and GET requests, and
the response to such a request is a service description (in the latter case,
a service description is only returned if ?graph or ?default is not
specified).  

Also, a request to the graph store other than POST should respond with a
status of 405 (Method Not Allowed) and the same is true of any request to
the service other than those described above.  Off the top of my head, I
can't think of what would make sense as a GET response to the graph store
other than (perhaps) a TriX/TriG RDF document that serializes all the named
and default graphs (or any other RDF syntax that supports a syntax for named
graphs in this way).  This seems a bit out of scope for the protocol (or at
least is a significant modification to make this late in the process), but I
would be interested in other people's opinions about what is appropriate
behavior for addressing the graph store.

* 1.1.7 Proper escaping of URIs and payload on update operations

See: 
http://lists.w3.org/Archives/Public/public-rdf-dawg-comments/2010Oct/0061.ht
ml

Unfortunately, I don't quite understand how the protocol (as described) is
vulnerable in this way. Section 4.2 (Indirect Graph Identification) already
discusses percent-encoding of embedded graph URIs and all non-idempotent
operations use the body of the request *and* the request URI to determine
the action to take, so I don't see the vulnerability to injection attacks.
Perhaps others understand this concern and can clarify?

[1] 
http://www.w3.org/2009/sparql/wiki/HTTP-UPDATE-ISSUES#ISSUE-56:_Does_HTTP_PA
TCH_affect_either_the_SPARQL_Protocol_or_the_SPARQL_Uniform_etc._HTTP_etc._P
rotocol.3F
[2] 
http://www.w3.org/2009/sparql/wiki/HTTP-UPDATE-ISSUES#.28No_formal_issue.29:
_Confusion_regarding_recommended_behavior_of_OPTION_method
[3] http://tools.ietf.org/html/rfc5023#section-9.2


===================================

P Please consider the environment before printing this e-mail

Cleveland Clinic is ranked one of the top hospitals
in America by U.S.News & World Report (2009).  
Visit us online at http://www.clevelandclinic.org for
a complete listing of our services, staff and
locations.


Confidentiality Note:  This message is intended for use
only by the individual or entity to which it is addressed
and may contain information that is privileged,
confidential, and exempt from disclosure under applicable
law.  If the reader of this message is not the intended
recipient or the employee or agent responsible for
delivering the message to the intended recipient, you are
hereby notified that any dissemination, distribution or
copying of this communication is strictly prohibited.  If
you have received this communication in error,  please
contact the sender immediately and destroy the material in
its entirety, whether electronic or hard copy.  Thank you.
Received on Tuesday, 30 November 2010 18:24:53 UTC