Fwd: SPARQL 1.1 security considerations

FYI, we got some internal LC feedback for Protocol (already prior to actually publishing LC ;-) )

Axel

Begin forwarded message:

> Resent-From: team-sparql-chairs@w3.org
> From: "Thomas Roessler" <tlr@w3.org>
> Date: 1 January 2012 13:47:31 GMT+01:00
> To: "Sandro Hawke" <sandro@w3.org>
> Cc: "Thomas Roessler" <tlr@w3.org>, "Alexandre Bertails" <bertails@w3.org>, "Technology and Society and Society Domain" <t-and-s@w3.org>, "team-sparql-chairs" <team-sparql-chairs@w3.org>
> Subject: SPARQL 1.1 security considerations
> archived-at: <http://www.w3.org/mid/6D3A0034-89D2-44AE-A092-B9D9C5B52F74@w3.org>
> 
> hi Sandro,
> 
> you had asked for a quick review of SPARQL 1.1 protocol and its security considerations:
>         http://www.w3.org/2009/sparql/docs/protocol-1.1/Overview.xml#policy-security 
> 
> I've reviewed the protocol spec basically in isolation, so I may be missing points here that are dealt with in other parts of the set of SPARQL specifications.  Also, while I'm sending this to Team + chairs lists, please feel free to forward in public.
> 
> 
> A few comments as I go through the spec:
> 
> - 2.1.2: This actually adds yet another definition of application/x-www-form-urlencoded to the set.  The current definition of that type is in HTML 4.01, and another one is in HTML 5.  RFC 3986 doesn't actually define this media type.  Also, you'll want to say what happens to non-ASCII UTF-8 characters, since RFC 3986 only talks about ASCII.
> 
> - The protocol supports no less than three query styles (GET, POST with forms, POST with with application/sparql-query).  Each of these appears to be optional for the client.  Are any mandatory for the server?  Is this unspecified?  If it's unspecified, how do you get interoperability?
> 
> - I see that SPARQL update lets me give the graph URI.  It appears as though that URI is completely divorced from the SPARQL endpoint URI, and as though the relationship between the two is out of scope for the specification.  Is that correct?  Is any expected relationship between the endpoint URI and the graph URI?
> 
> - in 2.3, I do not understand what "The SPARQL Protocol does not dereference query URIs" means, in particular given the previous point.
> 
> - Skimming the examples, I note that there are apparently some expectations about "ambiguous queries".  Are these considerations in the main SPARQL query language spec? unspecified?  It feels as though some glue between the different specs might be missing.
> 
> In short, clarifying the relationship between endpoint URIs and the URIs used to identify graphs would be hugely helpful to this reader.
> 
> 
> 
> As far as the security considerations are concerned, a few observations and questions:
> 
> 1. It appears from some parts of the specification that an UPDATE sent to a SPARQL endpoint can cause that endpoint to send an UPDATE to another SPARQL endpoint.  It doesn't look as though SPARQL includes any considerations around authentication and authorization for these sorts of scenarios.  Is the first endpoint supposed to just pass on credentials?  Something else?   Unspecified?  It would be useful to explain the delegation story in the security considerations a bit more, even if it boils down to "haven't dealt with it yet".
> 
> 
> 2. There doesn't seem to be any mechanism to authenticate (the elements of) responses when data are aggregated across multiple triple stores.  This appears to be a design decision.  It is worth calling that out.
> 
> 
> 3. The POST with application/x-www-form-urlencoded binding is susceptible to cross-protocol attacks that are worth spelling out.  Specifically, these requests can be caused by any site on the Web through form submission.
> 
> Suppose example.org has a wide-open triple store for experimentation purposes, where wide-open includes the ability to update the triple store.  example.org further uses HTTP basic (or digest) authentication for both its normal web site and the triple store.   A user navigates a browser to example.org/Team first.  They have their user name and password cached by the browser, just for this session.  They then navigate to evil.example.com.  That site causes a form submission.  Unknown to Eric, the form causes a POST request to example.org's triple store, and an update of that triple store, using Eric's credentials.
> 
> 
> 4. Speaking of cross-protocol attacks, the lack of a specification for the delegation piece may create a temptation for extremely unsafe deployments.  One example of such a deployment would be a case where you just use HTTP Basic Auth and pass on credentials to whoever else is asked.
> 
> In that case, assume the same scenario as in 3.  evil.example.com can now either use a POST request (from a web form that's submitted using JavaScript) to cause a cascade of requests to some other SPARQL endpoint of the attacker's choice, and hope that the credentials travel along, or it could just use HTTP GET to query for some data (an img tag or an invisible iframe will do fine for this), and again hope that the credentials might travel along.
> 
> 
> 5. Data exfiltration with UPDATE — I'm not sure whether this one's possible or not:  But can I cause SPARQL endpoint A to send an UPDATE request to SPARQL endpoint B, depending on or possibly incorporating data only known to A?  If yes, then evil.example.org could use the attack in 3 to exfiltrate data from A to B without Eric ever noticing.
> 
> 
> 6. In the same spirit, has anybody looked at what data leakage can occur in query federation, or otherwise when several SPARQL endpoints are combined?  E.g., I could imagine endpoint A to send a specific set of queries to endpoint B that leaks certain data from A.  If that can be combined with some of the other effects I mention above (either GET or POST could work here), then that'd be another interesting data leak.
> 
> 
> Note that the common theme of 3-5 is the possible re-use of credentials between HTML resources and SPARQL endpoints, and the fact that I can use HTML to speak the SPARQL protocol from a web app.  So that case is definitely worth discussing systematically.
> 
> Another common theme is the apparent lack of even a conceptual security framework that would permit us to say what confidentiality and integrity guarantees are commonly expected.  That, too, deserves a slightly more systematic discussion.
> 
> Going one level up, it looks like naive SPARQL deployments are, in general, highly susceptible to creating confused deputy problems:
>         http://en.wikipedia.org/wiki/Confused_deputy_problem
> 
> 
> The current security considerations start with denial of service attacks, and then become a bit confused as they discuss requests made by different endpoints on behalf of each other and on behalf of the user.  There's nothing in here that's patently wrong, but I don't think these security considerations actually answer the questions I'd ask.
> 
> Happy New Year,
> --
> Thomas Roessler, W3C  <tlr@w3.org>  (@roessler)
> 
> 
> 
> 
> 
> 
> 
> 
> 

Received on Tuesday, 3 January 2012 03:53:46 UTC