Re: SPARQL Security - Best Practices?

On  2 Sep 2008, at 2:44 PM, Jacek Kopecky wrote:

> Hi Richard,
> if I understand it correctly, a data store is allowed to provide any
> named graphs it wishes to. Could your problem be solved with a special
> named graph for the merge of all the data (allowed for a user)? I mean
> something like this:
>
> SELECT *
> FROM <http://localhost/special/all>
> WHERE {
>     ?s foo:bar ?baz ;
>        zob:zab ?bing .
> }

Sure; the implementation can mess around with the dataset all it  
likes, and that includes making available computed graphs. (The same  
thing can be done by having the server compute the appropriate dataset  
for the user, rather than computing a virtual graph.)


My point was that "standard" SPARQL doesn't include this kind of  
thing, and doesn't offer a convenient alternative. You can't expect  
computed graphs from just any implementation that passes the SPARQL  
test suite.


Computing a virtual graph also loses the original graph names, which  
might be needed. I haven't done the due diligence, but I don't think  
there's a good way to express rich queries -- many triple patterns,  
retrieving graph names -- and avoid the problem I mentioned. (Even  
computing a dataset with FROM loses the graph names, and FROM NAMED  
doesn't do what we need.)


Someone not too familiar with SPARQL could read the spec and decide to  
implement access control by putting access control information in the  
default graph, implementing access constraints purely through triple  
patterns and filters. They would soon realize that madness is the end  
destination for such a scheme! The neat queries they saw in the SPARQL  
spec turn into twisted monstrosities, because they need to specify  
huge datasets or manually split up patterns into a maze of GRAPH  
clauses. The alternative is to use some implementation-specific  
extension... tempting!


> This would be a data-store-specific extension, but it would work with
> the standard SPARQL query lang.


Whenever I see "implementation-specific extension", I would suggest  
just skipping the intermediate stages (computed graphs, syntax  
extensions...): jump straight to user-specific views of a store, where  
the triple store itself controls visibility on a per-triple level.

SPARQL can run a level above, ignoring access control entirely -- no  
changes to queries! This also frees up the dataset for user-specific  
use, and preserves graph names.

If you're going to be a monkey, be a gorilla! :)

The interesting questions are then "is there some part of this that  
could be meaningfully standardized?", and "are there any lessons to  
learn from this wrt SPARQL itself?". After all, once you've done  
access control you then have to tackle change management, provenance...


> Actually, if the query engine accesses
> named graphs from a Web server, the "union of all allowed graphs"  
> could
> be just another resource on that server, and the query engine would  
> need
> no extensions.


That's fine for trivial datasets, but it still suffers from the  
"disappearing graph names" problem. It also shifts the work from an  
implementation extension to a web server extension... the web server  
has to compute that union graph. The approach does seem to have  
benefits, though.

-R

Received on Tuesday, 2 September 2008 22:12:43 UTC