Shapes Constraint Language (SHACL) Working Draft of 2017-02-02 from Peter F. Patel-Schneider on 2017-02-04 (public-rdf-shapes@w3.org from February 2017)

From: Peter F. Patel-Schneider <pfpschneider@gmail.com>
Date: Fri, 3 Feb 2017 20:10:13 -0800
To: public-rdf-shapes@w3.org
Message-ID: <bc4b6448-a025-64fa-7266-8ef4c88a5829@gmail.com>
I took a quick look at the recent working draft of
https://www.w3.org/TR/shacl/ dated 02 February 2017.

The document says that the next version of the document is planned to be a
Candidate Recommendation but does not provide a schedule for comments for
this version of the document.  Nor does the document state a schedule for
responses to comments on previous working drafts of this document that have
not yet received substantive responses from the working group.

In this quick look I examined the document to see if some of the major
problems with the document have been solved.  What I found is that the three
major problems I first looked at remain unsolved.  Each of them still needs
significant work.  Each of them prevents reviewers of the document from
providing fully informed reviews of the definition of SHACL.  Given that
there are at least these three major, pervasive problems in the document, I
don't see that detailed comments on the rest of the document will be very
worthwhile at this time.


Pre-binding:

There has never been a definition of pre-binding that meets the needs of
SHACL.  The definition of pre-binding in this version of the document is no
different.  Pre-binding is only defined for a solution mapping and a graph
pattern.  However, all uses of pre-binding in SHACL are for a solution
mapping and a query so, in effect, there is no definition of pre-binding at
all in this document.

As well, there is no demonstration that the current definition of
pre-binding is well-behaved even where it is defined.

The document that is stated to be the source of the definition of
pre-binding for SHACL is a document that has not been accepted by anyone
other than the author of the document as far as I can tell.  Saying that it
is the draft of a WG CG report is giving a false impression of its effective
status.

The unsuitability of this definition of pre-binding has been already reported
in https://lists.w3.org/Archives/Public/public-rdf-shapes/2017Jan/0010.html
but there is no indication in the working draft that there are any problems
with pre-binding.  The lack of such an indication in the document means that
reviewers may miss the fact that much of the document has fundamental
problems.

As pre-binding is a central part of SPARQL-SHACL and is also used to
describe much of SHACL Core it is not possible for reviewers to provide
fully informed comments on large parts of SHACL at this time.  As there is
as of yet no suitable definition provided for pre-binding even though the
problems with it have been known since at least June of 2015 it will be
better at this late stage to simply remove all parts of SHACL and the SHACL
document that depend on pre-binding.


Shapes:

The way that shapes are formed and used in SHACL remains a severe problem.

There are shapes, node shapes, and property shapes.  There are also three
RDF terms that are related to shapes: sh:Shape, sh:NodeShape, and
sh:PropertyShape.

There is much confusing wording on how these all work together.

First, there is "sh:NodeShape and sh:PropertyShape can be used to represent
node and property shapes".  How do these RDF terms represent anything?

Second, there are what appear to be the main definitions of node shapes and
property shapes.
"A node shape is a shape in the shapes graph that is not the subject of a
triple with sh:path as its predicate."
"A property shape is a shape in the shapes graph that is the subject of a
triple that has sh:path as its predicate."
What is the role of sh:NodeShape and sh:PropertyShape if the definition
of node shapes and property shapes doesn't even refer to them?
This is only reinforced by
"However, the presence of any rdf:type triple does not determine whether a
node is treated as a node shape or not."
"However, the presence of any rdf:type triple does not determine whether a
node is treated as a property shape or not."

Third, there are what appear to be alternative definitions of node shapes and
property shapes.
"sh:NodeShape is the class of node shapes and should be declared as a type
for shapes that are IRIs."
"sh:PropertyShape is the class of property shapes and should be declared as a
type for shapes that are IRIs."
There are multiple problems with these alternative definitions.  For
starters, there is no description in SHACL of what it means to be the class
of anything.  Next, there is no description in SHACL of how to declare a
type for anything.  Further, there is the strong suggestion here that shapes
that are IRIs should somehow have both sh:NodeShape and sh:PropertyShape
declared as their type, which doesn't make sense at all.

Fourth, the conditions to be a shape include being a SHACL instance of
sh:NodeShape or sh:PropertyShape, but not sh:Shape.  This contradicts the
normative statements that rdf:type triples are irrelevant for determining
whether a node is a node or property shape.  It is also exceedingly weird as
sh:Shape is previously indicated to be somehow related to shapes, but being
a SHACL instance of sh:Shape in an RDF graph doesn't make a node a shape in
the graph.  As sh:Shape is the natural RDF term for the type of shapes,
users will use it over sh:NodeShape and sh:PropertyShape.

Aside from these problems with node shapes and property shapes, there are
problems with the definitions that shapes depend on.  For example, shapes
graphs are defined too narrowly.  SHACL validation processes don't always
validate a data graph against the shapes in another graph, but shapes graphs
are not defined for these other situations.

All this ends up with a big mess.  It appears that it is possible to use
sh:NodeShape and sh:PropertyShape in ways counter to what appears to be
their intended meaning.  For example,
  ex:s1 rdf:type sh:NodeShape ;
    sh:targetClass ex:Person ;
    sh:path ex:child ;
    sh:nodeKind sh:IRI .
appears to be form a constraint on the children of people even though the
type of the shape is sh:NodeShape.

What needs to be done is to get rid of sh:NodeShape and sh:PropertyShape.
They serve no useful purpose.  They will only produce confusion.  Then the
defintions underlying shapes need to be corrected.  Because of these
significant and pervasive problems with shapes in SHACL, reviewers cannot
provide fully informed commments on the SHACL document at this time.


Validation results and reports:

A validation report is the result of validation.  It is an RDF graph where
some nodes are validation results reporting on constraints that were not
satisifed.  There are serious problems in how validation reports are
generated and the form of validation reports.

The first problem is the generation of validation results.  Throughout the
definitions of SHACL Core constraint components there is wording like "For
each value node [...], a validation result MUST be produced with the value
node as sh:value." and "If [...], a validation result MUST be produced."
This means that each SHACL processor must produce these validation results
to be a conforming implementation of SHACL.

The processor must produce these validation results no matter whether they
are going to show up in the final validation report or not.  The processor
must produce these validation results even if it not going to return a
validation report at all.  This mixing of conformance requirements into the
definition of validation introduces an unnecessary and problematic
procedural aspect into the underlying definitions of SHACL.

Although it is mandated that a SHACL processor much produce these validation
results it is completely unclear how many must be produced.  A SHACL
processor may end up checking whether a particular node satisfies a
particular constraint numerous times.  Must it produce a validation result
for each of these times?  Must it only produce one validation result for all
of these times?  Or is the number of times it produce a validation result
undetermined?  This multiplicity problem can show up at top-level due to
converging sh:property chains.

The second problem is the form of a validation report.  There is
insufficient guidance on how multiple validation results are to be
produced.  For example, can a single validation result have multiple values
for sh:value, making it a validation report for multiple violations?
Similarly, if a shape has two sh:ClassConstraintComponent constraints, can
a single validation report be used for violations from both of them?
Without better guidance on these issues it will be very difficult to
determine just violations occured from a validation report.

The third problem is just what validation results are to be included in a
validation report and which of these are to be values of sh:result for the
single node in the graph that is a SHACL instance of sh:ValidationReport.
There is "Only the validation results that are not object of any sh:details
triple in the results graph are top-level results." and "The property
sh:detail may link a (parent) result with one or more other (child) results
that provide further details about the cause of the (parent) result."
So a validation process has to produce validation results which then end up
in the validation report if they are not values for sh:details triples.
What happens if a validation result comes from violation of a constraint
that is both directly at top level (e.g., from a property shape that is value of
sh:property for a shape that has targets) and not at top level (e.g., from
the same property shape as before that is linked to the shape with targets
via a combination of sh:node and sh:property triples)?  Can a SHACL
processor use sh:detail to collect that otherwise might be top-level
validation results?

There are also some other minor problems with validation reports.  For
example, there is the requirement that "A validation report has exactly one
value for the property sh:conforms that is of datatype xsd:boolean."
However, the result of validation is an RDF graph and RDF graphs so this
requirement doesn't make sense.  The definitions underlying validation
reports need to be carefully examined to eliminate problems like these.

Much of the description of how validation reports are generated and what
they contain need to be rewritten to remove any procedural aspects and to
suitably describe the contents of validation resports.  As this will change
large portions of the document, reviewers cannot provide fully informed
commments on it at this time.
Received on Saturday, 4 February 2017 04:10:53 UTC