Re: ISSUE-95: Proposal for model simplifications from Holger Knublauch on 2015-11-20 (public-data-shapes-wg@w3.org from November 2015)

From: Holger Knublauch <holger@topquadrant.com>
Date: Fri, 20 Nov 2015 08:02:46 -0500
To: public-data-shapes-wg@w3.org
Message-ID: <564F19F6.9080802@topquadrant.com>
Hi Arthur,

I think your simplifications take it a bit too far - this model is 
losing too many useful features. I don't see how it can be used to 
validate shapes or to drive user interfaces, leading to significant 
implementation burdens and an inconsistent design. It is unclear how 
"assertions" indicate where they can be used. Also it does not properly 
capture the fact that many assertions must have multiple different 
implementations depending on where they are used (e.g. there is no 
single SPARQL query for sh:minCount). It is unclear how native (SPARQL) 
constraints would be integrated. The parameters provide no useful 
information whatsoever, and your statement that assertions can be 
evaluated when all its parameters are present doesn't work for optional 
parameters (such as sh:flags). Terminology is not ideal (e.g. domain is 
already used in RDFS), then you introduce new terminology ("parameter") 
where we already had something else ("argument"). Other issues are 
unspecified - it's still just a rough outline only.

We do however converge on some aspects, so I hope we can work on a 
design together that we can all live with.

If we want to go back to the drawing board, I believe we should answer 
what this model is meant to be useful for. I see the following two roles:

1) The model should describe the structure of (valid) SHACL shapes graphs
2) The model should allow to attach executable information to drive the 
validation

The structure should be using things like classes, properties and 
subclass relationships. This would allow us to define both a SHACL file 
and an RDFS or even OWL file that share the same URIs for the same 
things. For example:

sh:MinCountConstraint
     a sh:Constraint ;
     sh:argument [
         sh:predicate sh:minCount ;
         sh:datatype xsd:integer ;
     ] ;

versus

sh:MinCountConstraint
     a rdfs:Class ;
     rdfs:subClassOf [
         a owl:Restriction ;
         owl:onProperty sh:minCount ;
         owl:allValuesFrom xsd:integer ;
     ] ;

And then

sh:PropertyConstraint
     rdfs:subClassOf sh:MinCountConstraint

which basically means that the property sh:minCount can be used at 
instances of sh:property. Both OWL and SHACL tools would understand 
that, and we have a proper description of valid SHACL documents.

To cover the 2) validation aspect, we could introduce a separate concept 
of executable objects, called "handlers" or "validators". These would be 
linked to from the constraints, e.g.

sh:MinCountConstraint
     sh:propertyValidator [
         a sh:SPARQLValidator ;
         sh:sparql "..." ;
     ] ;
     sh:inversePropertyValidator [
         ...
     ] ;

Turning these into resources would be a bit more verbose than what we 
currently have, but could be considered cleaner and more flexible, e.g. 
if a validator requires additional property values such as JavaScript 
libraries or to indicate different SPARQL bodies for different 
architectures. We can then also point at validation functions directly

sh:MinLengthConstraint
     sh:propertyValidator sh:hasMinLength ;
     sh:focusNodeValidator sh:hasMinLength .

Like in your design (and my old design), the validation will be executed 
if all non-optional arguments are present. Like in your design, 
unnecessary repetition is avoided. Like in your design, every constraint 
type has a stable URI that implementers can attach their validators to 
and that can be used in validation results (ISSUE-96).

I think this approach would have the same simplicity as yours.

Feedback appreciated.
Holger


On 11/19/2015 10:27, Arthur Ryman wrote:
> This proposal simplifies the SHACL model.
>
> This proposal is based on the observation that at present virtually
> all of our constraints combine two orthogonal aspects, namely 1)
> forming a set of nodes and then 2) making some assertion about that
> set. One exception is Closed Shape constraints which are a special
> case and outside the scope of this proposal.
>
> Section 3.1 Property Constraints [1] defines constraints based on sh:property.
> Section 3.2 Inverse Property Constraints [2] defines constraints based
> on sh:inverseProperty. It is currently empty but would be a virtual
> repeat of 3.1.
> Furthermore, we also need to be able to apply constraints directly to
> the focus node using sh:constraint.
>
> [1] http://www.w3.org/TR/2015/WD-shacl-20151008/#constraints-property
> [2] http://www.w3.org/TR/2015/WD-shacl-20151008/#constraints-inverse-property
>
> Define the base class sh:Constraint to be the set of all constraints.
> Define the following disjoint subclasses of sh:Constraint:
> sh:FocusNodeConstraint, sh:PropertyConstraint, and
> sh:InversePropertyConstraint.
>
> In RDFS we have:
> sh:Constraint a rdfs:Class .
> sh:FocusNodeConstraint rdfs:subclassOf sh:Constraint .
> sh:PropertyConstraint rdfs:subclassOf sh:Constraint .
> sh:InverseProperty Constraint rdfs:subclassOf sh:Constraint .
>
> A shapes graphs does not need to state these classes explicitly.
> Instead we infer the classes using the following range statements:
>
> sh:constraint rdfs:range sh:FocusNodeConstraint .
> sh:property rdfs:range sh:PropertyConstraint .
> sh:inverserProperty rdfs:range sh:InversePropertyConstraint .
>
> Define the domain of a constraint to be the set of nodes that we are
> going to make assertions about.
> The domain of a constraint is defined as follows:
> Let D be the data graph.
> Let N be the focus node.
> Let P be the sh:predicate value for property and inverse property constraints.
>
> For sh:FocusNodeConstraint, domain = { N }, the singleton set
> consisting of just the focus node.
>
> For sh:PropertyConstraint, domain = { O | (N, P, O) in D }, the set of objects.
>
> For sh:InversePropertyConstraint, domain = { S | (S, P, N) }, the set
> of subjects.
>
> The above definitions factor out the first aspect of constraints. The
> second aspect is the assertions that we want to make about the domain.
> A given constraint node will contain zero or more assertions. For
> example, consider the following shape:
>
> ex:PersonShape a sh:Shape ;
>    sh:property [
>      sh:predicate ex:name ;
>      sh:minCount 1 ;
>      sh:maxCount 1 ;
>      sh:nodeKind sh:Literal
>    ]
>
> ex:PersonShape has one constraint node and three assertions about the
> domain consisting of the objects of ex:name.
> Let D be a data graph, let N be the focus node, and let E be the
> domain (objects of ex:name at N).
> The assertions are:
> minCount 1: #E >= 1
> maxCount 1: #E <= 1
> nodeKind Literal: E subsetOf nodesOfKind(Literal)
>
> An assertion can be regarded as a set of domains.
> minCount(M) = { E | #E >= M }
> maxCount(M) = { E | #E <= M }
> nodeKind(K) = { E | E subsetOf nodesOfKind(K) }
> ...
> hasValue(V) = { E | V in E }
> etc.
>
> A constraint is satisfied when the domain E satisfies each assertion
> included in the constraint.
>
> Define the class sh:Assertion to be the set of assertions.
> Each of the assertions defined by SHACL (built-ins) is a subclass of assertion.
> Each assertion class must define a set of parameter properties using
> sh:parameter.
> An assertion is included in a constraint when all of its parameter
> properties are present in the constraint node.
>
> For example,
> sh:MinCount rdfs:subclassOf sh:Assertion ;
>    sh:parameter sh:minCount .
>
> sh:MaxCount rdfs:subclassOf sh:Assertion ;
>    sh:parameter sh:maxCount .
>
> sh:NodeKind rdfs:subclassOf sh:Assertion ;
>    sh:parameter sh:NodeKind .
>
> sh:QualifiedMinCount rdfs:subclassOf sh:Assertion ;
>    sh:parameter sh:qualifiedMinCount, sh:qualifiedShape .
>
> This mechanism requires that the parameter properties determine the
> assertion. This holds for most of the currently defined constraints.
> The exceptions are the logical combinators for shapes. In the current
> spec, these constraints are defined using an explicit rdf:type, e.g.
> sh:OrConstraint, and the parameter properties are common, e.g.
> sh:shapes is used in both sh:OrConstraint and sh:AndConstraint.
>
> The parameter names for the logical shape combinators should be
> renamed as follows so that they unambiguously determine the class of
> the assertion:
>
> sh:HasShape rdfs:subclassOf sh:Assertion ;
>    sh:parameter sh:shape .
>
> sh:NotShape rdfs:subclassOf sh:Assertion ;
>    sh:parameter sh:not .
>
> sh:AndShapes rdfs:subclassOf sh:Assertion ;
>    sh:parameter sh:and .
>
> sh:OrShapes rdfs:subclassOf sh:Assertion ;
>    sh:parameter sh:or .
>
> The set of assertions can be extended by anyone. To specify a custom
> assertion, define a subclass of sh:Assertion and include a property
> that implements the assertion in some supported extension language.
> The SHACL specification defines the property sh:sparql and associated
> language binding rules for SPARQL. For example,
>
> ex:MyAssertion rdfs:subclassOf sh:Assertion ;
>    sh:parameter ex:myParameter ;
>    sh:sparql "SELECT ... " .
>
> Summary
> - this proposal refactors constraints into domain and assertion parts,
> allowing assertions to be defined just once and reused for different
> domain types
> - the assertions included in a constraint are determined by matching
> parameter properties against all the instances of sh:Assertion that
> exist in the shapes graph
> - custom assertions are defined by adding at least one implementation
> in some extension language
> - the property sh:sparql and binding rules for extensions implemented
> in SPARQL are defined by SHACL.
>
> -- Arthur
>
Received on Friday, 20 November 2015 13:03:20 UTC