Re: shapes-ISSUE-130 (rdf dataset assumption): SHACL should not assume that the data graph is in an RDF dataset [SHACL Spec] from Holger Knublauch on 2016-03-23 (public-data-shapes-wg@w3.org from March 2016)

From: Holger Knublauch <holger@topquadrant.com>
Date: Thu, 24 Mar 2016 09:51:18 +1000
To: RDF Data Shapes Working Group <public-data-shapes-wg@w3.org>
Message-ID: <56F32BF6.5000206@topquadrant.com>
On 24/03/2016 9:04, Peter F. Patel-Schneider wrote:
> On 03/21/2016 04:09 PM, Holger Knublauch wrote:
>> On 22/03/2016 5:39, Peter F. Patel-Schneider wrote:
>>> My understanding is that access to the shapes graph during query execution
>>> time is only needed at first glance to handle extension constructs like
>>>      GRAPH $shapesGraph ....
>>>
>>> It can be convenient to use this construct in extension code so as to permit
>>> the investigation of triples in the shapes graph.  This can be used, for
>>> example, in the implementation of sh:classIn.  However, other implementation
>>> methods are possible as you can attest to.
>>>
>>> There are then two questions:
>>>
>>> 1/ Is there some portion of the core of SHACL where access to the shapes graph
>>> is needed at query execution time?  The answer to this is no.  Our two
>>> implementations are partial evidence for this answer.
>>>
>>> 2/ Are there things that can be done with this access in the SPARQL extension
>>> that cannot be done without it?  The answer to this is probably yes in some
>>> sense but probably no in another.  Access to the shapes graph during query
>>> execution allows the queries to pick up arbitrary information from the shapes
>>> graph, thus the yes answer.  However, if there are only a few useful things do
>>> be done on the shapes graph then each of them can probably be set up before
>>> query execution, thus the no answer.
>>>
>>> For example, for constructs that have lists as arguments it is convenient to
>>> just query the shapes graph to get the elements of the list.  However, it is
>>> also possible at query preparation time to grab the elements of the list and
>>> send them into the query.  This can be done in several ways, including via a
>>> VALUES construct and via looping control in the query engine.
>> No, this is not possible,
> It is indeed possible at query preparation time to grab the elements of a list
> from the shapes graph and send them into the query.  This is what my
> implementation of SHACL does.  It is also possible to do this from templates,
> which is what my in-progress templated implementation of SHACL does.

It is of course easy to hard-code the SHACL core in any way or form, 
even without using SPARQL at all, but that's not what we are discussing 
here.

For the extension mechanism (incl templates) we could also invent some 
additional RDF vocabulary or text substitution mechanims that describe 
how to inject certain required triples into SPARQL strings. But that's 
just substituting a flexible, generic solution ($shapesGraph) with a 
likely fragile approach of unknown complexity. SPIN always has 
$shapesGraph access activated, and I am afraid people will be surprised 
if they move to SHACL. In proposal 3, the SPARQL queries are embedded 
into objects that may carry extra properties. Some of these extra 
properties could be defined by 3rd parties to improve interoperability 
with limited environments such as remote endpoints (which will never be 
able to cope with blank nodes anyway). IMHO such work doesn't need to 
happen in the WG, or could become a separate deliverable time permitting.

A more general solution to the SPARQL endpoint situation would be 
ISSUE-71 - a network protocol similar to the SPARQL HTTP protocol. That 
would solve a number of problems and produce best performance.

Holger


>
>> and I hope we don't reopen the longish $shapesGraph
>> debate all over again. Extensions may write arbitrary queries against
>> background data stored in the shapes graph, and there is no generic way of
>> transforming any SPARQL query to avoid this. VALUES only allows you to
>> iterate.
> I already said above that at query execution time it may be desirable for
> queries to pick up (and process) arbitrary information from the shapes graph.
>
>> In other cases you'd need FILTER IN and in others you'd need
>> arbitrary Basic Graph Pattern matches against these external graphs. All these
>> are very real world use cases that we encounter all the time. Proposal 4
>> requires even more access to the shapes graph. (Even if it existed, formally
>> defining such an algorithm would become a show-stopper for this WG, given that
>> we already struggle with pre-binding).
>>
>> ShapesGraph access is supposed to be optional for the Core vocab, and that was
>> always the intention. Our definitions use $shapesGraph for convenience, but
>> actual implementations can of course hard-code these queries instead, or in
>> the worst case do outside loops against an endpoint.
>>
>> Holger
> peter
>
Received on Wednesday, 23 March 2016 23:51:54 UTC