Re: resolving ISSUE-47: Can SPARQL-based constraints access the shape graph, and how? from Holger Knublauch on 2015-06-15 (public-data-shapes-wg@w3.org from June 2015)

From: Holger Knublauch <holger@topquadrant.com>
Date: Mon, 15 Jun 2015 14:43:58 +1000
To: public-data-shapes-wg <public-data-shapes-wg@w3.org>
Message-ID: <557E580E.1040200@topquadrant.com>
On 6/15/2015 14:16, Dimitris Kontokostas wrote:
>
> > Here is another case, for named graph access in general. Assume you 
> want to validate that certain terms from a given query graph are also 
> present as SKOS concepts in some other graph. That SKOS named graph is 
> not accessible to the server running dbpedia. With the general SPARQL 
> endpoint scenario this is not implementable unless the endpoint can 
> call out to external named graphs. So endpoints are very limited 
> already, but this limitation shouldn't propagate into every use case.
> >
>
> This is just applying shacl in the union of two graphs. Not related to 
> the shacl graph or ?shapesGraph
>

Let me clarify what I mean: Assuming you have a local thesaurus stored 
in a named graph on the dataset that starts the validation. Now you want 
to validate that certain values from your data graph are also present in 
the reference thesaurus. You would have queries such as

SELECT ?this
WHERE {
     ?this ex:someProperty ?value .
     FILTER NOT EXISTS {
         GRAPH <http://my.local.graph> {
             ?something skos:prefLabel ?value .
         }
     }
}

As soon as you have such scenarios, the SPARQL endpoint would need to be 
able to access the same named graphs as the local dataset. Running them 
over a union graph would work but then your endpoint becomes just 
another wrapped graph (which is fine by me but will have poor 
performance). Scenarios where the data graph is outside of the query 
dataset are generally problematic and need to be detected in advance. 
Below you state that you want to forbid ?shapesGraph because people may 
use it for the wrong reasons. Ok, then you also need to remove GRAPH 
from SPARQL, because it may also lead to unsupported scenarios. 
?shapesGraph is just one named graph among others - the problem is more 
general.

> Sound and complete recursion is hard to optimize but a workaround 
> would be possible with precomputed prevalence and detection queries up 
> to a fixed level.
>
> Regarding recursion, IMHO it might be convenient for some cases but 
> there are very few cases where it is actually needed and supporting it 
> will not be easy.
>

We probably should do a strawpoll whether we can disallow recursion in 
general, just to get an update on where people stand. The last time I 
looked there was a lot of support for recursion, e.g. from ShEx people. 
I personally have no strong opinion.

> > nor SHACL functions nor blank node treatment.
>
> I don't see anything special with blank nodes. OK Jena has some 
> additional utility functions but there spec shouldn't rely on third 
> party libraries.
>

The difference is that if you get constraint violations about blank 
nodes from a SPARQL end point, there is no way to communicate to the 
user where to find this node, because it will have a different ID each 
time. In graphs and datasets this is more consistent. Does this mean we 
should now generally disallow running SHACL over blank nodes, only 
because SPARQL endpoints don't fully support them? I don't think so, but 
this would be a consequence of the lowest-common-denominator approach.

I believe we need to generally accept that your execution environments 
will be heterogeneous, and not every scenario can handle all possible 
SHACL files. Once we have accepted this, it is a matter of defining the 
boundaries and fallback mechanisms. I believe it was Ted in the last 
meeting who suggested that on the web, languages should fail gracefully 
if they encounter cases that they cannot handle. Running SHACL against 
SPARQL endpoints will have such limitations.

Regards,
Holger
Received on Monday, 15 June 2015 04:46:14 UTC