Re: [ISSUE-32] Implications of updates on protocol, regarding HTTP methods

On Wed, Jul 29, 2009 at 3:10 PM, Seaborne, Andy<andy.seaborne@hp.com> wrote:
>
>> -----Original Message-----
>> From: public-rdf-dawg-request@w3.org [mailto:public-rdf-dawg-request@w3.org]
>> On Behalf Of Paul Gearon
>> Sent: 29 July 2009 19:05
>> To: public-rdf-dawg@w3.org
>> Subject: [ISSUE-32] Implications of updates on protocol, regarding HTTP
>> methods
>>
>> This email discharges my action
>> http://www.w3.org/2009/sparql/track/actions/55
>>
>>
>> The initial SPARQL language and protocol (SPARQL/Query 1.0,
>> SPARQL/Protocol 1.0) both describe read-only operations, which left no
>> change of state on the server. SPARQL/Update is expected to use the
>> SPARQL/Protocol as well, however it is designed to modify state on the
>> server, which in turn has implications for the protocol.
>
> Expected? The expectation from SQL is SELECT and INSERT?DELETE go down the
> same connection does create a user perception but that's also the source of injection
> attacks.

I did make a bit of a bald assertion in saying that we expect to use
the same protocol, didn't I?  :-)  I said this, because I inferred it
from the conversation. I was concerned because the protocol cannot
simply be re-purposed, which is what started this issue.

> Option 1 does not make that assumption.

True. I also note your wording of "go down the same connection". I'm
expecting HTTP either way, though queries vs. updates may be different
methods. They may even be the same method (such as POST) but with
different parameters (update= instead of query=).

> ...
>>
>> Options for modifying the existing protocol for SPARQL/Update:
>>
>> Option 1: Write an unrelated new protocol for the HTTP binding
>> SPARQL/Update to operate on.
>> This would appear to be duplicating some work, and still needs to
>> address how modifying operations need to be called.
>
> It has the advantage that different security can be applied at the HTTP level to the
> different endpoints.  It also can help avoid confusion resulting on the same endpoint
> between update and query operations causing injection attacks on the query endpoint.

I was never thinking that a writable interface should share the same
connection as a query interface. I see the query-only interface to be
an extremely valuable part of the specification. Any attempt to update
it to handle write operations after authorization is guaranteed to
create security holes in some implementations (that's even presuming
the protocol doesn't include such holes).

On the other hand, I though that it might be convenient to have
querying capabilities on the writing endpoint. If you have access to
to write, then you already have all the permission you need to do
damage. Besides, since INSERT and DELETE can be described using a
query (an effective CONSTRUCT), and these should be capable of
subqueries, then you'll be capable of doing complex queries anyway,
even if you're not returning the results.

> What work is duplicated?  Sure there is some but my experience with Joseki is that it is
> very little and it wasn't designed with update in mind originally.

In retrospect, I agree. Writing operations are simply a matter of
processing the request, and responses will be status codes, all of
which will be handled by HTTP libraries.

> In Joseki, the same code routes query and update requests on different endpoints with
> the sole difference that update operations direct the request, after associating with the
> datasets, to an update processor, not a query processor.  The HTTP POST handling is
> the same.

Mulgara also has different servlets for handling the read-only and
read-write endpoints. The read-only one is obvious. The read-write
endpoint sends the command off to the parser and then chooses what to
do with the resulting operation object. If the HTTP method was GET and
the object returned from parsing is not a
SELECT/CONSTRUCT/ASK/DESCRIBE then a 405 is returned. For a POST then
it will execute the operation no matter what it was.

I'm not saying this is great. Just that it's a way to do it.

>> Option 2: All modifying operations go through POST.
>> While not mandated, the standard use of POST is to provide all data in
>> the body. This is how the query operation works.
>
> Just clarity - it's a WWW-encoded form, so the body is "query=...."
>
>> This may inconvenient
>> for applications that may want to execute a simple operation that can
>> be encapsulated in the URI.
>>
>> Option 3: All modifying operations go through PUT with a fallback to
>> POST for large commands.
>> This is similar to the definition of query which uses GET and POST.
>> However, this is awkward if doing a PUT or a POST for a command that
>> is trying to delete resources, such as triples or graphs, as the
>> expected semantics of these methods is to add data to a server. Also,
>> PUT is more tightly defined than POST, expecting a resource in the
>> URI, while a different resource MUST be referred to with a different
>> URI.
>>
>> Option 4: Use appropriate methods for each action.
>> This means using PUT to create resources, DELETE to remove them, GET
>> and HEAD to query them. However, this is what the REST protocol will
>> be doing, and makes the notion of a SPARQL/Update language
>> superfluous.
>>
>>
>> Option 2 appears to offer the least difficulty. Are other options available?
>
> Options 1 and 2 (different endpoint, same endpoint) are both possible if we ensure an
> update can't look like a query.

While I do agree with this, I'm more of a mind that any kind of update
can only occur on a writable endpoint. If that is the case, then there
is no reason to hide a write operation in a query, since writes are
permitted anyway. Injecting a write into a query on such an endpoint
would then be safe, though useless.

> Given that POSTing a query is still "query=" in the body,
> it's possible to do that.  But I think we should decide one of the other.

Well, if I were to indulge in wishful thinking, then I'd really like
to reserve POST for writing and GET for reading (meaning that a
read-only endpoint need only respond to GET), but I know that we can't
rewrite the original protocol. I also know that POST was defined this
way for compatibility with web forms, which is a valid use case. So
now back to reality...

Given the existing protocol, I'd like to see GET be for queries alone,
and POST accept parameters of query= and update= (or some similar
name). On a read-only endpoint a POSTed parameter other than query=
will return a 403. I believe this is option 2 since it is extending
the read-only protocol to do read-write.

Regards,
Paul

Received on Wednesday, 29 July 2009 22:22:12 UTC