Re: SPARQL Security - Best Practices?

One issue I have encountered in the past is that a query like

   SELECT * {
     GRAPH ?g {
       ?s foo:bar ?baz ;
          zob:zab ?bing .
     }
     FILTER (allowed(?g))
   }

will only return answers where *both* triple patterns match in the  
same permitted graph. The user's intent is "match these two triple  
patterns in the union of triples from allowed graphs", but the query  
actually means "for each allowed graph, try to match these two triple  
patterns". Fewer results are returned than they expect.

If your data is spread across multiple graphs that the user can see --  
e.g., some of their triples are private and some public -- then you  
hit this problem.

This limitation results in ugly workarounds such as


   SELECT * {
     GRAPH ?g1 {
       ?s foo:bar ?baz
     }
     GRAPH ?g2 {
          zob:zab ?bing .
     }
     FILTER (allowed(?g1) && allowed(?g2))
   }

GRAPH is the wrong construct to use for this sort of query. Probably  
the right solution is to include the access control information in the  
dataset construction ("FROM = every graph the user can see"):

   SELECT *
   FROM <allowed-1>
   FROM <allowed-2>
   ...
   WHERE {
     ?s foo:bar ?baz ;
        zob:zab ?bing .
   }

but that means the query is specific to the user (or you have to use  
out-of-band dataset selection).

Perhaps SPARQL 2.0 will have some construct that allows filtering the  
dataset within the query, or otherwise address this issue. Individual  
implementations, of course, can provide access control through other  
means.

A couple of years ago I was working on a system that very heavily used  
very complex access control. My ultimate conclusion was that standard  
SPARQL was not very well suited to this kind of thing. That's an  
interesting conclusion for a SPARQL implementor to draw, but there you  
are :)

-R

Received on Tuesday, 2 September 2008 21:16:04 UTC