Re: shapes-ISSUE-74 (SPARQL endpoint support): Should SHACL support vallidating RDF graphs accessible via unmodified SPARQL endpoints [SHACL Spec] from Holger Knublauch on 2015-08-06 (public-data-shapes-wg@w3.org from August 2015)

From: Holger Knublauch <holger@topquadrant.com>
Date: Fri, 7 Aug 2015 08:37:38 +1000
To: "public-data-shapes-wg@w3.org" <public-data-shapes-wg@w3.org>
Message-ID: <55C3E1B2.2080904@topquadrant.com>
On 8/7/2015 5:51, Arthur Ryman wrote:
> Holger/Peter,
>
> I think Peter's example graph is reasonable and graphs like it occur
> frequently in the real world. Developers often use blank nodes rather
> than create IRIs because it makes the RDF more compact looking, not
> because they are making an existential statement (which is what blank
> nodes are for). In many cases, you can avoid blank nodes and use hash
> IRIs instead to indicate substructure. However, if SHACL is going to
> be relevant to the real world, it will have to deal with blank nodes.
>
> That being said, if the problem is how to interpret the constraint
> violation results for even a single SPARQL query, then I don't see
> SHACL as being any more problematic in this respect than SPARQL which
> lets you return blank nodes in a result.

This is partially correct: SPARQL lets you return blank nodes, yet the 
SPARQL endpoint protocol replaces them with an arbitrary per-message ID. 
So as long as you are inside of an API that operates on result sets and 
node objects (say in Java) it's OK, but as soon as the network protocol 
gets in between, there is a problem. This situation propagates into 
SHACL. If it were only about the blank nodes found in a shapes graph 
(e.g. an anonymous sh:Shape), we could probably create a work-around 
using temporary replacement URIs. But the general case (of bnodes in the 
data) is tricky.

> If this becomes a problem in
> practice, then we could look at improving the error reporting in
> SHACL, e.g. by providing additional context for the blank nodes.

SPIN has a notion of violationPaths, which are supposed to point from a 
root node (e.g. the focus node) to the values causing the violation. 
This required a lot of extra machinery to work and significantly 
complicates the result format. With the switch from CONSTRUCT to SELECT 
queries, we dropped this feature in SHACL and limited it to subject, 
predicate, object. Given that the violating value may be anywhere deep 
in a graph, constructing such paths is very difficult and I do not see 
how to generalize it in a user-friendly way. Furthermore, even for a 
subject/predicate combination, there may be multiple bnodes and only one 
of them may cause the error.

Holger


>
> -- Arthur
>
> On Thu, Jul 16, 2015 at 7:10 PM, Holger Knublauch
> <holger@topquadrant.com> wrote:
>> On 7/17/2015 4:01, Arthur Ryman wrote:
>>> Peter,
>>>
>>> I assume the point of this example is that it contains a blank node
>>> which makes it problematic to have two separate SPARQL calls.
>>>
>>> It seems to me that whatever mechanism is used to associate a shape
>>> with a node would provide a starting point from which one could
>>> navigate to all subsequent nodes using suitable property paths. This
>>> would provide enough context for subsequent SPARQL calls. However,
>>> this certainly complicates the implementation.
>>
>> The issue remains that even if we put everything into a single query, there
>> is no way to make sense of the results (i.e. pointers to specific blank
>> nodes) because SPARQL endpoints have no round-trippable bnode identifiers.
>> Given this and the aforementioned limitations of endpoints, my response to
>> this ticket is that the best thing we can achieve is to define an
>> Endpoint-safe subset of SHACL that excludes bnodes, user-defined functions,
>> recursion and mixing SPARQL with other languages. I am finding it
>> unfortunate that this topic is influencing the discussion about ?shapesGraph
>> access, which is unproblematic in Dataset-based architectures. (I still have
>> a task to open a separate wiki page on that).
>>
>> Holger
>>
>>
>>> -- Arthur
>>>
>>> On Sat, Jul 11, 2015 at 11:58 PM, RDF Data Shapes Working Group Issue
>>> Tracker <sysbot+tracker@w3.org> wrote:
>>>> shapes-ISSUE-74 (SPARQL endpoint support): Should SHACL support
>>>> vallidating RDF graphs accessible via unmodified SPARQL endpoints [SHACL
>>>> Spec]
>>>>
>>>> http://www.w3.org/2014/data-shapes/track/issues/74
>>>>
>>>> Raised by: Peter Patel-Schneider
>>>> On product: SHACL Spec
>>>>
>>>> Should it be possible to validate SHACL shapes on RDF graphs that are
>>>> only accessible via unmodified SPARQL endpoints?
>>>>
>>>> For example, suppose
>>>> G = { < ex:a ex:r _:a .
>>>>           _:a ex:q ex:b . }
>>>> is a data graph to be validated against the shapes
>>>> S1 = ex:r S2 [1,1]
>>>> S2 = ex:q [1,1]
>>>>
>>>> Should it be possible to perform the validation if the only access G is
>>>> via SPARQL queries?
>>>>
>>>> If this is possible, it should also be possible for very large data
>>>> graphs.
>>>>
>>>>
>>>>
>>
Received on Thursday, 6 August 2015 22:38:18 UTC