Comments on Shapes Constraint Language (SHACL) W3C Editor's Draft 22 February 2017 from Peter F. Patel-Schneider on 2017-02-25 (public-rdf-shapes@w3.org from February 2017)

From: Peter F. Patel-Schneider <pfpschneider@gmail.com>
Date: Sat, 25 Feb 2017 07:03:48 -0800
To: "public-rdf-shapes@w3.org" <public-rdf-shapes@w3.org>
Message-ID: <584f6f67-ca59-5dd9-de11-efc81db65ad4@gmail.com>
Here are some comments on this document.  In summary, there are still lots
of significant problems.  Addressing these problems will result in
substantial changes to this document.

I have not examined the portions of the document labelled as non-normative
as closely as the normative sections.  I have treated all portions of the
document labelled as non-normative as if they do not contain any normative
content.

Some of the comments below are about problems that occur in multiple places
in the document.  I have not always listed all of these places.

It was hard to decipher the odd wording in many places to get to the
underlying meaning.  The document needs a rewrite to state the definition of
SHACL in a consistent manner.  Part of the problem here, but only part, is
the mixing of different and phrases that can carry definitional import,
including "if ... then", "declared", "specified", and "is".  Another part of
the problem is the use of "MUST" in the definitions of constraint
components.

This set of comments is separate from my previous comments on SHACL.


** Problems in early definitions:

"A property path is a possible route in a graph between two graph nodes."
Route is not defined.  Possible route is not defined.  (This wording is used
once in the SPARQL document but is not defined there.)

"A binding is a pair (variable, RDF term), consistent with the term's use in
SPARQL."
If binding is taken from SPARQL just link to its definition as is done for
the other terms taken from SPARQL

"A solution is a set of bindings, one row in the body of the result table of
a SPARQL query."
In SPARQL a solution is not one row in the body of the result table of a
SPARQL query.  That is just how it is shown in some places.  The actual
correct term is solution mapping.

"The results table is a SolutionSequence, a list of solutions, possibly
unordered."
Results table is not used in SHACL.

"A node in an RDF graph is a SHACL instance of a SHACL class in the graph if
one of its SHACL types is the class."
SHACL types require reference to a graph.

"true denotes the RDF term "true"^^xsd:boolean. false denotes the RDF term
"false"^^xsd:boolean."
There is no notion of what denotes means here.

"Target declarations are values of certain properties (such as
sh:targetClass) for a shape in a shapes graph."
More than just a value is needed.

"All bindings of the variable this from the solution become focus nodes."
SPARQL queries do not return a solution as defined in this document.


** Unclear wording:

"(In this document, the verbs specify or declare are sometimes used to
express the fact that an RDF term has property values in a graph.)"
As opposed to an RDF term not having any values for any property in a graph?

"A constraint component is an IRI."
I don't see how every IRI is a constraint component.

"The IRI is used, among others, in validation reports."
I never would have imagined that validation reports could only use IRIs that
are constraint components.

"SHACL-SPARQL can be used to declare additional constraint components based
on SPARQL."
What part of SHACL are these additional constraint components in?


** Strange links:

The definition of member has a link back to itself.


** Normative wording in non-normative portions of the document:

Section 1.6 is labelled as non-normative but discusses how to treat other
portions of the document and states conformance requirements for SHACL
implementations.

In many places there are what are labelled as SPARQL definitions of SHACL
Core constraint components.  These are in non-normative portions of the
document.  The document needs to not make the impression that these bits of
SPARQL are definitions of portions of the SHACL Core.

There are probably other places where definitional wording occurs in
non-normative parts of the document.  These should be removed or changed to
not give any impression that they are normative.


** SHACL Vocabulary:

What is the status of the SHACL vocabulary?  SHACL is not an ontology that
needs a RDF graph interpreted using the RDFS semantics to provide its
vocabulary.  All that SHACL needs is a set of IRIs that are used in its RDF
syntax.  What then is the status of the mentioned RDF graph?  How is this
graph to be interpreted?  Is information entailed by the graph using the RDF
or RDFS semantics have any effect on SHACL?  Is any information in the graph
part of the definition of SHACL?  Of SHACL Core?  Is all of the information
in the graph part of the definition of SHACL?  Of SHACL Core?


** Shapes:

If any node in a shapes graph has a sh:shape link back to itself then the
shapes graph is recursive and behaviour of SHACL processors on the graph is
undefined even if this node is completely disconnected from the rest of the
graph.  It would be better to have the behaviour of SHACL processors defined
in cases like this.

It used to be that top-level shapes were conventionally indicated by an
rdf:type link to sh:Shape.  This has apparently changed to sh:NodeShape.
There is no apparent reason for this change, which should be changed back.

sh:Shape does not appear to have any effect at all in SHACL.  If this is the
case then it should be removed.

The syntax rule for shapes doesn't appear to be a syntax rule at all.
Instead it is defining what a shape is.

"A shape in a shapes graph declares a constraint of kind c if c is a
constraint component and the shape has values for all mandatory parameters
of c. The constraint declaration consists of the values that the shape has
for all mandatory and optional parameters of that component."
The word "declares" is only harmful here.
For constraint components that have more than one parameter this definition
loses which value is connected to which parameter.  For shapes that have two
values for a single-parameter constraint component there is only one
resultant constraint and that constraint has both of the values of the
parameter.

"Note that the definition above does not include all of the syntax rules of
well-formed shapes."
There is no notion of well-formed shapes introduced in the document, even
though it is used in several places.  Similarly there is no notion of
well-formed property shape or well-formed node shape introduced in the
document even though both of these are used in the document.

"Note that the definitions of well-formed property shapes and node shapes
make these two sets of nodes disjoint."
There is no definition of either well-formed property shape or well-formed
node shape.


** Property Paths:

"A node in an RDF graph is a well-formed SHACL property path p if it
satisfies exactly one of the syntax rules in the following sub-sections. A
node p is not a well-formed SHACL property path if p is a blank node and any
path mappings of p directly or transitively reference p."
It is possible that a node could both satisfy exactly one of the syntax
rules and also refer back to itself.  What happens then?
Every path mapping of p references p so all blank nodes are not well-formed
SHACL property paths.

"A sequence path is a blank node that is a SHACL list with at least two
members and each member is a well-formed SHACL property path."
Sequence paths can have extra information associated them.

"An alternative path is a blank node that is the subject of exactly one
triple in G."  "An inverse path is a blank node that is the subject of
exactly one triple in G."  And so on.
These paths can't have extra information associated with them.  Any blank
node that is the subject of exactly one triple is lots of kinds of paths.


** Non-validating information

What happens if the requirements in the non-normative 2.3.2 are violated?


** Targets:

"If s is a SHACL instance of sh:NodeShape or sh:PropertyShape in a shapes
graph SG and s is also a SHACL instance of rdfs:Class in SG then the set of
SHACL instances of s in a data graph DG is a target from DG for s in SG."
So a node that is a SHACL instance of sh:Shape and a SHACL instance of
rdfs:Class will not produce an implicit class target.  This is going to
trip up a lot of people and needs to be changed.


** Validation:

"Conformance checking is a simplified version of validation, producing a
boolean result."
There is no definition of which boolean value conformance checking produces.

"the validation process"
What counts as part of the validation process?  Does checking for
ill-formedness?  Does entailment?  Does piecing together the shapes graph or
the data graph?

"For example, SHACL processors may support recursion scenarios or produce an
error when they detect recursion."
I expect that this should be failure instead of error.

"A shapes graph is an RDF graph containing zero or more shapes that is
passed into a SHACL validation process so that a data graph can be validated
against the shapes."
Is the graph that is used when a node in a graph is validated against a
shape in it a shapes graph?

"Every value of sh:shapesGraph is an IRI representing a graph that should be
included into the shapes graph used to validate the data graph."
Shouldn't this be SHOULD?  Or should it be MAY as is used in the next
sentence?

"Validating an RDF term against a shape involves validating the term against
each of the components for which the shape has values for all mandatory
parameters, using the validators associated with the respective component."
This incorrectly uses constraint components instead of constraints.

"The validation of a focus node in the data graph against a constraint in
the shapes graph produces the top-level validation results that are produced
by the validator of the constraint component, using as input the focus node,
the specific values of the parameters in the constraint, and the value nodes
of the shape in the data graph."
SHACL constraint components have multiple validators.
Validation needs more than just the information listed above.
Shapes don't have value nodes at all.
The output of a validation process is not completely defined, so there is no
notion of *the* results of validation.

"A focus node conforms to a shape if and only if the validation of the shape
does not produce any validation result or a failure."
So no focus node will conform to a node shape that has a sh:not parameter.


** Conformance:

"All SHACL implementations MUST at least cover the Core."
Covering is not defined.

"This specification describes conformance criteria for: SHACL Core [...],
SHACL-SPARQL [...], SHACL Shapes Graphs [...], Validation of a data graph
against a shapes graph [...], Validation of an RDF term from a data graph
against a shape from the shapes graph [...], SHACL Core processors [...],
SHACL-SPARQL processors [...]."
Conformance is something that is required of implementations.  It doesn't
make sense for special kinds of graphs or processes to conform.  Instead
these are defined as something.

An RDF graph is a (mathematical) set of RDF triples.  There are no
operations that can modify an RDF graph defined in RDF.


** Core Constraint Components:

"This section defines the built-in SHACL Core constraint components that
MUST be supported by all SHACL Core processors."
Are SPARQL-SHACL processors required to support these components as well?

"The SPARQL definitions in this section represent potential validators."
The SPARQL queries are in informative parts of the document and can't be
considered to be definitions.  As many of the SPARQL
definitions currently have problems, it would be better to just remove all
the SPARQL query stuff from this section.

"The following constraint components represent restrictions on the number of
value nodes."
Presumably the number of value nodes for a particular focus node, not the
number of value nodes overall.

"If this parameter is omitted then there is no limit on the number of
triples."
Which triples?  How can this parameter be omitted and there still be a
constraint set up for it?

"The values of sh:pattern are literals with datatype xsd:string that are
valid pattern arguments for the SPARQL REGEX function."
Having this as a syntax condition means that
checking for correct operation of SHACL processors will need to be aware of
whether a value for sh:pattern is a valid pattern argument for the SPARQL
REGEX function.

"If $flags has a value then it MUST be interpreted according to the third
argument of the SPARQL REGEX function."
What does the mean?  I'm guessing that the constraint component is supposed
to act as if this value is the third argument, not that this value is
interpreted as anything.

"For each pair of value nodes and the values of the property $lessThan at
the given focus node where the first value is not less than the second value
(based on SPARQL's < operator) or where the two values cannot be compared, a
validation result MUST be produced with the value node as sh:value."
How many different validation results need to be produced if the set of
value nodes is the set { 1, 2 } and the set of property values is the set
{ "a", "b" }?

"For each value node that produces no validation results against the shape
$not"
So if there is a conjunction under the not, a validation result will be
produced here if just one of the conjuncts produces a validation result.

"For each value node where the validation of the value node against any of
the members of $and produces a validation result and no failure, a
validation result MUST be produced with the value node as sh:value."
So if there is a negation under any of the conjunctions a validation result
will always be produced.

"If a value node is violating the constraint, sh:shape will produce only a
single validation result, with sh:ShapeConstraintComponent as its
sh:sourceConstraintComponent."
Not correct.  The sh:shape could produce other validation results for other
value nodes.

"On the other hand side, sh:property may produce any number of validation
results, and these will have the individual constraint components of the
property shape as their values of sh:sourceConstraintComponent."
Not correct.  The validation results produced here may have other values for
sh:sourceConstraintComponent.


** Specify and  Declare:

Specify and declare are use throughout the document in strange ways.  It
would be better to replace these with words that do not have so much
baggage, e.g., "For example, shapes can state" or "A shape in a shapes graph
has a constraint of".


** SHACL-SPARQL:

All aspects of SHACL-SPARQL depend heavily on pre-binding.  As pre-binding
has never had a workable definition in SHACL there is no purpose in closely
reviewing this part of the document at this time.  Either this part of the
document needs to be removed or a workable definition of SHACL provided and
this phase of the W3C process repeated.

It is not clearly stated what SHACL Core processors need to do when they
encounter constructs from SHACL-SPARQL.  It might be the case that SHACL
Core processors can completely ignore SHACL-SPARQL constructs, because SHACL
Core processors only "cover" SHACL Core, but it might also be the case that
SHACL Core processors need to examine SHACL-SPARQL constructs, because these
constructs might cause some nodes to be ill-formed.

Both sh:prefixes and sh:prefix are used.  I expect that only one is needed.

Peter F. Patel-Schneider
Nuance Communications
Received on Saturday, 25 February 2017 15:04:26 UTC