Re: ISSUE-139: Cases where constraint components do not make sense from Holger Knublauch on 2016-04-10 (public-data-shapes-wg@w3.org from April 2016)

From: Holger Knublauch <holger@topquadrant.com>
Date: Sun, 10 Apr 2016 11:02:20 +1000
To: "public-data-shapes-wg@w3.org" <public-data-shapes-wg@w3.org>
Message-ID: <5709A61C.5040603@topquadrant.com>
On 9/04/2016 1:21, Peter F. Patel-Schneider wrote:
> The first thing to consider is how this version of SHACL handles shapes. A
> shape is always considered against a set of nodes. For some components, like
> sh:minCount, it matters that the set is there but other components work on the
> elements of the set individually and thus can pretend that there are only
> individual nodes to be considered. (The current version of SHACL actually
> works this way for components in property and inverse property constraints,
> but the component itself is responsible for generating the set.)

In the above, I see no difference here between our proposals. Section 3 
has the term "value nodes" for these sets. Also, many constraint 
components can be expressed via Node validation functions (currently 
section 9.4 but with pending updates in Proposal 3). Using node 
validation functions, the loop that produces the bindings is outsourced 
into a more general solution. The spec uses the "full" SELECT query for 
consistency, but we could theoretically rewrite some of the definitions 
to only include the body of the validation functions. Not sure what 
approach would be better for the reader.

>
> The syntax you provide does not conform to Proposal 4. See
> https://www.w3.org/2014/data-shapes/wiki/Refactor for more information on this
> syntax.  I have rewritten your syntax in the closest analogue in Proposal 4
> and added comments to describe the meaning of each component.
> More information on the meaning of components can again be found at
> https://www.w3.org/2014/data-shapes/wiki/Refactor.
>
> MyShape a sh:Shape ;
>
>    sh:propValues ( (sh:inverse ex:father)
>  [ sh:datatype xsd:string ] ) ;
> # Elements of the set validate precisely when all their values for the
> # inverse of ex:father have datatype xsd:string

There is no use case for this feature. It can never happen that the 
inverses of a property have datatype xsd:string, because subjects can 
never be literals.

>
>    sh:disjoint ( ex:thisProperty ex:otherProperty ) ;
> # Elements of the set validate precisely when their values for
> # ex:thisProperty are disjoint from their values for ex:otherProperty
>
>    sh:disjoint ( (sh:inverse ex:mother ) ex:otherProperty );
> # Elements of the set validate precisely when values for the
> # inverse of ex:mother are disjoint from their values for ex:otherProperty

(Your proposed syntax is remarkably inconsistent here. All other 
property constraints would be written such as

sh:propValues ( ex:thisProperty [ sh:disjoint ex:otherProperty ] )

Why this inconsistency, and what happens if someone uses the above syntax?)

>
>    sh:hasValue 10 ;
> # The set as a whole validates precisely when it contains 10 as a member.

This is useless for node constraints. Do you have a real-world example 
where this is needed?

>
>    sh:minCount 5 ;
> # The set as a whole validates precisely when it has at least five members.

This is useless for node constraints. The size of the set of value nodes 
is always exactly 1.

>
>   sh:propValues ( (sh:inverse ex:brother)
>  sh:minInclusive 42 ) ;
> # Elements of the set validate precisely when all of their values for the
> # inverse of ex:brother are larger than 42.

This is useless because the subject of a triple can never be larger than 
a literal.

>
>   sh:uniqueLang true ;
> # The set as a whole validates precisely when it does not contain elements
> # that share a language tag.

This is useless for node constraints, because there can only ever be 
exactly one value.

>
>    sh:propValues ( (sh:inverse ex:parent)
>  [ sh:uniqueLang true ] ) ;
> # Elements of the set validate precisely when none of their values for the
> # inverse of sh:parent share a language tag.

This is useless because inverse values can never have a language tag.

>
>    sh:shape [ sh:filter ex:PersonShape ; sh:minCount 3 ] .
> # The set as a whole validates precisely when it has at least three elements
> # that validate against ex:PersonShape.

This is useless for node constraints because the size of the set is 
always 1.

Oh, and I forgot: Please explain the meaning and use case for

ex:MyShape a sh:Shape
     sh:property [
         sh:predicate ex:myProperty ;
         sh:closed true ;
         sh:ignoredProperties ( rdf:type ) ;
     ] .

If you re-read my original question below, you should notice that I have 
challenged you to explain when these cases make practical sense. *You 
have declined to answer my question.* Instead you have listed entirely 
theoretical scenarios that have zero practical relevance. It is 
counter-productive to support these completely useless scenarios. We 
would even need to provide test cases. Instead of pretending that these 
use cases are supported, our language should prevent such silly mistakes.

Even worse, your approach is throwing together different use cases and 
attempts to treat them all uniformly. This does not work in practice. 
Take sh:hasValue as an example. The SPARQL query is different depending 
on the context: For sh:property it would be

     ASK { $this $predicate $hasValue }

while for sh:constraint it would be

     ASK { FILTER (?this = $hasValue) }

A uniform solution would require the latter query in both cases, making 
the query incredibly inefficient (first loop over all value and then 
FILTER compared to a direct look-up in the database index). Your own 
implementation does a very poor job here, producing extremely 
inefficient SPARQL query

     https://github.com/pfps/shacl/blob/master/shacl.py (line 245 hasValueC)

The situation is even worse for extensions, if users don't have the 
facility to define different queries for the three contexts (which is 
another "simplification" of your proposal).

In all these scenarios, Proposal 3 does a much better job:
- It is explicit about the contexts in which a constraint component can 
be used
- It allows different validators (SPARQL queries) for each context

Holger


>
>
> On 04/07/2016 08:44 PM, Holger Knublauch wrote:
>> Hi Peter,
>>
>> in your Proposal 4 all constraint components (sh:minCount etc) are applicable
>> in all contexts (property constraints, inverse property constraints, node
>> constraints). The following examples (using current syntax) would become
>> valid. Could you please explain what the meaning of each of these cases would
>> be, and when these cases make practical sense?
>>
>> ex:MyShape a sh:Shape ;
>>
>>      sh:inverseProperty [
>>          sh:predicate ex:father ;
>>          sh:datatype xsd:string ;
>>      ] ;
>>
>>      sh:constraint [
>>          sh:disjoint ex:otherProperty ;
>>      ] ;
>>
>>      sh:inverseProperty [
>>          sh:predicate ex:mother ;
>>          sh:disjoint ex:otherProperty ;
>>      ] ;
>>
>>      sh:constraint [
>>          sh:hasValue 10 ;
>>      ] ;
>>
>>      sh:constraint [
>>          sh:minCount 5 ;
>>      ] ;
>>
>>      sh:inverseProperty [
>>          sh:predicate https://www.w3.org/2014/data-shapes/wiki/Refactorex:brother ;
>>          sh:minInclusive 42 ;
>>      ] ;
>>
>>      sh:constraint [
>>          sh:uniqueLang true ;
>>      ] ;
>>
>>      sh:inverseProperty [
>>          sh:predicate ex:parent ;
>>          sh:uniqueLang true ;
>>      ] ;
>>
>>      sh:constraint [
>>          sh:qualifiedValueShape ex:PersonShape ;
>>          sh:qualifiedMinCount 3 ;
>>      ] ;
>>
>> For an overview of the current design, see the summary table at the beginning
>> of chapter 3:
>>
>>      http://w3c.github.io/data-shapes/shacl/#constraints
>>
>> Thanks,
>> Holger
>>
>>
Received on Sunday, 10 April 2016 01:02:55 UTC