Re: ISSUE-23: Where should we query for class and subClassOf? from Holger Knublauch on 2016-01-20 (public-data-shapes-wg@w3.org from January 2016)

From: Holger Knublauch <holger@topquadrant.com>
Date: Wed, 20 Jan 2016 14:02:43 +1000
To: "public-data-shapes-wg@w3.org" <public-data-shapes-wg@w3.org>
Message-ID: <569F06E3.6010308@topquadrant.com>
In development and data entry tools like TopBraid, people will often 
have shapes and data in the same graph (or imports closure). This makes 
the shapes basically part of the data. Despite this, we have a button to 
include or exclude validation of the shapes, so that users can bypass 
the time-consuming testing of the shapes themselves. This is achieved by 
excluding certain classes from validation - sh:Constraint and sh:Shape 
in particular. The algorithm that collects the shapes to validate can 
simply filter those out (see [1]), so that only the data-specific shapes 
remain. While I don't claim this algorithm is perfect, it shows that 
many issues can be solved by engineering, without putting the burden on 
the user to worry about where to put which triple.

Holger

[1] 
https://github.com/TopQuadrant/shacl/blob/master/src/main/java/org/topbraid/shacl/constraints/ModelClassesFilter.java

On 20/01/2016 1:47 PM, Arthur Ryman wrote:
> Holger/Irene,
>
> One of the use cases we'd like to support is using SHACL to validate
> SHACL. This means we need a way to distinguish data from metadata.
> Looking for metadata in the data graph isn't necessarily fatal. We
> need more discussion on this topic. The spec must be clear on this
> point.
>
> -- Arthur
>
> On Tue, Jan 19, 2016 at 8:56 PM, Irene Polikoff <irene@topquadrant.com> wrote:
>> I agree that requiring applications to copy triples from one graph to another before they can submit to SHACL engine complicates things and it is a concern.
>>
>> Can it be assumed as a default that boy shapes and data graph are queried and then, for cases where performance of this may be an issue, have a way to set an indicator that only shapes graph should be queried?
>>
>> May guess is that this is not often will be a performance issue. Looking for parent classes of a given class is an easy query even for a large dataset.
>>
>> Sent from my iPhone
>>
>>> On Jan 19, 2016, at 8:09 PM, Holger Knublauch <holger@topquadrant.com> wrote:
>>>
>>> (I predict that the separation between shapes and data graph will become a FAQ topic for SHACL. It may have been better to leave this topic out in version 1 of the standard, as many users will topple over it. Such things may break the standard's adoption because they complicate everything. In the end, the main benefit of having a shapes graph is optimizing performance (so that the data graph is not polluted), yet this may be considered premature optimization as engines can probably take care of this themselves.)
>>>
>>> Anyway, your specific suggestion MAY work for me, as the shapes graph is a conceptual/logical entity only and engines may inject any number of triples into the shapes graph prior to validation. This doesn't make life easier though, so I am not sure.
>>>
>>> Two issues come to my mind:
>>>
>>> 1) if we assume the rdfs:subClassOf triples reside in the shapes graph only, then the user/engine needs to take care to consider extensibility: the data graph may include extensions of the core ontology that a shapes graph was developed against. For example someone may create a subclass ex2:Cat while the shape only knows about ex1:Animal. To prepare validation, some agent would need to make sure that ex2:Cat rdfs:subClassOf ex1:Animal is visible to the shapes graph.
>>>
>>> 2) sh:class currently looks at the data graph. If we change the behavior of sh:scopeClass then we arguably would also need to change sh:class to walk the shapes graph.
>>>
>>> Holger
>>>
>>>
>>>> On 20/01/2016 4:38 AM, Arthur Ryman wrote:
>>>> While reading the spec I noticed the following statement:
>>>>
>>>> "To determine class membership, the rdf:type and rdfs:subClassOf
>>>> triples are queried in the data graph."
>>>>
>>>> However, querying the data graph for class and subclass information is
>>>> inconsistent with the example SPARQL for determining which shapes are
>>>> classes:
>>>>
>>>> "As syntactic sugar for the scenario above, SHACL includes a rule that
>>>> if a class is also a shape (in the shapes graph), then the
>>>> sh:scopeClass triple pointing at itself can be omitted. This rule is
>>>> illustrated by the following SPARQL CONSTRUCT query, which may be
>>>> executed over the shapes graph prior to validation, to produce the
>>>> implicit sh:scopeClass triples."
>>>>
>>>> CONSTRUCT {
>>>> ?class sh:scopeClass ?class .
>>>> }
>>>> WHERE {
>>>> ?class rdfs:subClassOf*/rdf:type rdfs:Class .
>>>> ?class rdfs:subClassOf*/rdf:type sh:Shape .
>>>> }
>>>>
>>>> I propose that in both cases we query the shapes graph, NOT the data graph.
>>>>
>>>> Recall that we expect the application to provide a shapes graph and a
>>>> data graph as input to the SHACL validator. Therefore, the application
>>>> can always copy any rdfs:Class and rdfs:subClassOf triples into the
>>>> shapes graph. Although RDF does not require class definitions to be
>>>> separated from data instances, in practice these are often separated.
>>>> Both shapes and classes are more properly regarded as metadata than
>>>> data.
>>>>
>>>> -- Arthur
>>>
Received on Wednesday, 20 January 2016 04:03:20 UTC