ISSUE-95: Proposal for model simplifications from Arthur Ryman on 2015-11-19 (public-data-shapes-wg@w3.org from November 2015)

From: Arthur Ryman <arthur.ryman@gmail.com>
Date: Thu, 19 Nov 2015 10:27:04 -0500
To: "public-data-shapes-wg@w3.org" <public-data-shapes-wg@w3.org>
Message-ID: <CAApBiOmw8wgvn7X4cQDE5vT-=ycd1xbYDTtZdh1+ViMPayNp5Q@mail.gmail.com>
This proposal simplifies the SHACL model.

This proposal is based on the observation that at present virtually
all of our constraints combine two orthogonal aspects, namely 1)
forming a set of nodes and then 2) making some assertion about that
set. One exception is Closed Shape constraints which are a special
case and outside the scope of this proposal.

Section 3.1 Property Constraints [1] defines constraints based on sh:property.
Section 3.2 Inverse Property Constraints [2] defines constraints based
on sh:inverseProperty. It is currently empty but would be a virtual
repeat of 3.1.
Furthermore, we also need to be able to apply constraints directly to
the focus node using sh:constraint.

[1] http://www.w3.org/TR/2015/WD-shacl-20151008/#constraints-property
[2] http://www.w3.org/TR/2015/WD-shacl-20151008/#constraints-inverse-property

Define the base class sh:Constraint to be the set of all constraints.
Define the following disjoint subclasses of sh:Constraint:
sh:FocusNodeConstraint, sh:PropertyConstraint, and
sh:InversePropertyConstraint.

In RDFS we have:
sh:Constraint a rdfs:Class .
sh:FocusNodeConstraint rdfs:subclassOf sh:Constraint .
sh:PropertyConstraint rdfs:subclassOf sh:Constraint .
sh:InverseProperty Constraint rdfs:subclassOf sh:Constraint .

A shapes graphs does not need to state these classes explicitly.
Instead we infer the classes using the following range statements:

sh:constraint rdfs:range sh:FocusNodeConstraint .
sh:property rdfs:range sh:PropertyConstraint .
sh:inverserProperty rdfs:range sh:InversePropertyConstraint .

Define the domain of a constraint to be the set of nodes that we are
going to make assertions about.
The domain of a constraint is defined as follows:
Let D be the data graph.
Let N be the focus node.
Let P be the sh:predicate value for property and inverse property constraints.

For sh:FocusNodeConstraint, domain = { N }, the singleton set
consisting of just the focus node.

For sh:PropertyConstraint, domain = { O | (N, P, O) in D }, the set of objects.

For sh:InversePropertyConstraint, domain = { S | (S, P, N) }, the set
of subjects.

The above definitions factor out the first aspect of constraints. The
second aspect is the assertions that we want to make about the domain.
A given constraint node will contain zero or more assertions. For
example, consider the following shape:

ex:PersonShape a sh:Shape ;
  sh:property [
    sh:predicate ex:name ;
    sh:minCount 1 ;
    sh:maxCount 1 ;
    sh:nodeKind sh:Literal
  ]

ex:PersonShape has one constraint node and three assertions about the
domain consisting of the objects of ex:name.
Let D be a data graph, let N be the focus node, and let E be the
domain (objects of ex:name at N).
The assertions are:
minCount 1: #E >= 1
maxCount 1: #E <= 1
nodeKind Literal: E subsetOf nodesOfKind(Literal)

An assertion can be regarded as a set of domains.
minCount(M) = { E | #E >= M }
maxCount(M) = { E | #E <= M }
nodeKind(K) = { E | E subsetOf nodesOfKind(K) }
...
hasValue(V) = { E | V in E }
etc.

A constraint is satisfied when the domain E satisfies each assertion
included in the constraint.

Define the class sh:Assertion to be the set of assertions.
Each of the assertions defined by SHACL (built-ins) is a subclass of assertion.
Each assertion class must define a set of parameter properties using
sh:parameter.
An assertion is included in a constraint when all of its parameter
properties are present in the constraint node.

For example,
sh:MinCount rdfs:subclassOf sh:Assertion ;
  sh:parameter sh:minCount .

sh:MaxCount rdfs:subclassOf sh:Assertion ;
  sh:parameter sh:maxCount .

sh:NodeKind rdfs:subclassOf sh:Assertion ;
  sh:parameter sh:NodeKind .

sh:QualifiedMinCount rdfs:subclassOf sh:Assertion ;
  sh:parameter sh:qualifiedMinCount, sh:qualifiedShape .

This mechanism requires that the parameter properties determine the
assertion. This holds for most of the currently defined constraints.
The exceptions are the logical combinators for shapes. In the current
spec, these constraints are defined using an explicit rdf:type, e.g.
sh:OrConstraint, and the parameter properties are common, e.g.
sh:shapes is used in both sh:OrConstraint and sh:AndConstraint.

The parameter names for the logical shape combinators should be
renamed as follows so that they unambiguously determine the class of
the assertion:

sh:HasShape rdfs:subclassOf sh:Assertion ;
  sh:parameter sh:shape .

sh:NotShape rdfs:subclassOf sh:Assertion ;
  sh:parameter sh:not .

sh:AndShapes rdfs:subclassOf sh:Assertion ;
  sh:parameter sh:and .

sh:OrShapes rdfs:subclassOf sh:Assertion ;
  sh:parameter sh:or .

The set of assertions can be extended by anyone. To specify a custom
assertion, define a subclass of sh:Assertion and include a property
that implements the assertion in some supported extension language.
The SHACL specification defines the property sh:sparql and associated
language binding rules for SPARQL. For example,

ex:MyAssertion rdfs:subclassOf sh:Assertion ;
  sh:parameter ex:myParameter ;
  sh:sparql "SELECT ... " .

Summary
- this proposal refactors constraints into domain and assertion parts,
allowing assertions to be defined just once and reused for different
domain types
- the assertions included in a constraint are determined by matching
parameter properties against all the instances of sh:Assertion that
exist in the shapes graph
- custom assertions are defined by adding at least one implementation
in some extension language
- the property sh:sparql and binding rules for extensions implemented
in SPARQL are defined by SHACL.

-- Arthur
Received on Thursday, 19 November 2015 15:27:36 UTC