- From: Arthur Ryman <arthur.ryman@gmail.com>
- Date: Thu, 19 Nov 2015 10:27:04 -0500
- To: "public-data-shapes-wg@w3.org" <public-data-shapes-wg@w3.org>
This proposal simplifies the SHACL model. This proposal is based on the observation that at present virtually all of our constraints combine two orthogonal aspects, namely 1) forming a set of nodes and then 2) making some assertion about that set. One exception is Closed Shape constraints which are a special case and outside the scope of this proposal. Section 3.1 Property Constraints [1] defines constraints based on sh:property. Section 3.2 Inverse Property Constraints [2] defines constraints based on sh:inverseProperty. It is currently empty but would be a virtual repeat of 3.1. Furthermore, we also need to be able to apply constraints directly to the focus node using sh:constraint. [1] http://www.w3.org/TR/2015/WD-shacl-20151008/#constraints-property [2] http://www.w3.org/TR/2015/WD-shacl-20151008/#constraints-inverse-property Define the base class sh:Constraint to be the set of all constraints. Define the following disjoint subclasses of sh:Constraint: sh:FocusNodeConstraint, sh:PropertyConstraint, and sh:InversePropertyConstraint. In RDFS we have: sh:Constraint a rdfs:Class . sh:FocusNodeConstraint rdfs:subclassOf sh:Constraint . sh:PropertyConstraint rdfs:subclassOf sh:Constraint . sh:InverseProperty Constraint rdfs:subclassOf sh:Constraint . A shapes graphs does not need to state these classes explicitly. Instead we infer the classes using the following range statements: sh:constraint rdfs:range sh:FocusNodeConstraint . sh:property rdfs:range sh:PropertyConstraint . sh:inverserProperty rdfs:range sh:InversePropertyConstraint . Define the domain of a constraint to be the set of nodes that we are going to make assertions about. The domain of a constraint is defined as follows: Let D be the data graph. Let N be the focus node. Let P be the sh:predicate value for property and inverse property constraints. For sh:FocusNodeConstraint, domain = { N }, the singleton set consisting of just the focus node. For sh:PropertyConstraint, domain = { O | (N, P, O) in D }, the set of objects. For sh:InversePropertyConstraint, domain = { S | (S, P, N) }, the set of subjects. The above definitions factor out the first aspect of constraints. The second aspect is the assertions that we want to make about the domain. A given constraint node will contain zero or more assertions. For example, consider the following shape: ex:PersonShape a sh:Shape ; sh:property [ sh:predicate ex:name ; sh:minCount 1 ; sh:maxCount 1 ; sh:nodeKind sh:Literal ] ex:PersonShape has one constraint node and three assertions about the domain consisting of the objects of ex:name. Let D be a data graph, let N be the focus node, and let E be the domain (objects of ex:name at N). The assertions are: minCount 1: #E >= 1 maxCount 1: #E <= 1 nodeKind Literal: E subsetOf nodesOfKind(Literal) An assertion can be regarded as a set of domains. minCount(M) = { E | #E >= M } maxCount(M) = { E | #E <= M } nodeKind(K) = { E | E subsetOf nodesOfKind(K) } ... hasValue(V) = { E | V in E } etc. A constraint is satisfied when the domain E satisfies each assertion included in the constraint. Define the class sh:Assertion to be the set of assertions. Each of the assertions defined by SHACL (built-ins) is a subclass of assertion. Each assertion class must define a set of parameter properties using sh:parameter. An assertion is included in a constraint when all of its parameter properties are present in the constraint node. For example, sh:MinCount rdfs:subclassOf sh:Assertion ; sh:parameter sh:minCount . sh:MaxCount rdfs:subclassOf sh:Assertion ; sh:parameter sh:maxCount . sh:NodeKind rdfs:subclassOf sh:Assertion ; sh:parameter sh:NodeKind . sh:QualifiedMinCount rdfs:subclassOf sh:Assertion ; sh:parameter sh:qualifiedMinCount, sh:qualifiedShape . This mechanism requires that the parameter properties determine the assertion. This holds for most of the currently defined constraints. The exceptions are the logical combinators for shapes. In the current spec, these constraints are defined using an explicit rdf:type, e.g. sh:OrConstraint, and the parameter properties are common, e.g. sh:shapes is used in both sh:OrConstraint and sh:AndConstraint. The parameter names for the logical shape combinators should be renamed as follows so that they unambiguously determine the class of the assertion: sh:HasShape rdfs:subclassOf sh:Assertion ; sh:parameter sh:shape . sh:NotShape rdfs:subclassOf sh:Assertion ; sh:parameter sh:not . sh:AndShapes rdfs:subclassOf sh:Assertion ; sh:parameter sh:and . sh:OrShapes rdfs:subclassOf sh:Assertion ; sh:parameter sh:or . The set of assertions can be extended by anyone. To specify a custom assertion, define a subclass of sh:Assertion and include a property that implements the assertion in some supported extension language. The SHACL specification defines the property sh:sparql and associated language binding rules for SPARQL. For example, ex:MyAssertion rdfs:subclassOf sh:Assertion ; sh:parameter ex:myParameter ; sh:sparql "SELECT ... " . Summary - this proposal refactors constraints into domain and assertion parts, allowing assertions to be defined just once and reused for different domain types - the assertions included in a constraint are determined by matching parameter properties against all the instances of sh:Assertion that exist in the shapes graph - custom assertions are defined by adding at least one implementation in some extension language - the property sh:sparql and binding rules for extensions implemented in SPARQL are defined by SHACL. -- Arthur
Received on Thursday, 19 November 2015 15:27:36 UTC