review of SHACL document from Peter F. Patel-Schneider on 2015-09-02 (public-data-shapes-wg@w3.org from September 2015)

From: Peter F. Patel-Schneider <pfpschneider@gmail.com>
Date: Tue, 1 Sep 2015 17:40:08 -0700
To: RDF Data Shapes Working Group <public-data-shapes-wg@w3.org>
Message-ID: <55E64568.5060800@gmail.com>
 Review of
 Shapes Constraint Language (SHACL)
 W3C Editor's Draft 10 August 2015


Summary:

There are many problems with this document.  It has multiple technical
flaws.  It has undefined terms.  It is misleading.  It does not accurately
reflect the state of working group deliberations.

I deem it to not be suitable for publication as a first public work draft of
the working group.  Changing the document to make it suitable would probably
touch every subsection and almost every paragraph of the initial sections.


Overall Comments:

This document is likely to be the first document about SHACL that interested
parties will read, so I am reviewing it with this in mind.  I'm assuming
that the reader is quite familiar with RDF and somewhat familiar with
SPARQL.

I find it very difficult to get a notion of what SHACL is from reading the
first portions of the document.  Not only are there many incorrect or
misleading statements, there is no overall description of what SHACL is for
and the example inappropriately centrally includes shapes as classes.  The
initial section of the document appears to be trying to be like a
mini-primer but it doesn't work for this purpose because of the lack of an
overall description.  Compare this document with Core SHACL Semantics at
http://w3c.github.io/data-shapes/semantics/ which provides a short but clear
notion of what that version of SHACL is for.

I have detailed comments on the Abstract and Section 1 below.  I have
somewhat less detailed comments on later sections.

I fear that releasing the current document as a FPWD will create a large
amount of confusion.  In my opinion someone needs to do a full rewrite of
the document before it is suitable for publication.


Abstract

"SHACL (Shapes Constraint Language) is an RDF vocabulary for describing RDF
graph structures."
This is not the purpose of SHACL, at least as far as I am concerned.
Further, SHACL is not an RDF vocabulary.
Something along the lines of 'SHACL (...) is a language for checking
constraints against RDF graphs.'  Subsequent sentences in the abstract
should be changed to match this.

"Additional constraints can be associated with shapes using SPARQL and
similar executable languages. These executable languages can also be used to
define new high-level vocabulary terms."
Given that there are no provisions for languages other than SPARQL at the
present time, this feature should not be mentioned in the abstract.
Instead just use 'Additional constraints can be associated with shapes using
SPARQL.'

1. Introduction

"The SHACL vocabulary includes high-level concepts to represent restrictions
on predicates used in triples."
SHACL does not work this way, but instead can be used to check whether
constraints on the information in an RDF graph are satisfied or not.
SHACL does not have high-level concepts.
Replace with "The SHACL language includes high-level constructs to represent
constraints on the information in RDF graphs."

"Some users and implementors will be content with using the high-level shapes
language only"
The idea of partial implementations does not belong here.
Replace with "Some users will be content with using the high-level language
only".

"express other restrictions in an executable language such as SPARQL"
There is no need to say that SPARQL is executable.
Replace with 'express other conditions as SPARQL queries' and adjust the
wording in the rest of the paragraph accordingly.

"SHACL definitions are represented in RDF and can be serialized in multiple
RDF formats. The example snippets in this document use Turtle [turtle] and
(not written yet:) JSON-LD [json-ld] notation."
SHACL does not have definitions.  If there is no JSON-LD yet, then don't
mention it.
Replace with 'SHACL is written in RDF and can be serialized in multiple
RDF formats. The examples in this document use Turtle [turtle]."

1.2 Overview and Terminology of Core Features

This initial example centrally uses a controversial feature - shapes as
classes.  It needs to be rewritten to not use this feature.  I have not
further commented on problems in this section related to shapes as classes.

"Each ex:Issue must point to exactly one user via the property
ex:submittedBy and may have one value for ex:assignedTo."
SHACL doesn't do ``must''.  Instead in SHACL one writes constraints and then
checks to see whether the constraints are satisfied.
Replace with 'One shape requires that each instance of ex:Issue
has exactly one value for ex:submittedBy, that that value is an instance of
schema:Person that satisfies the constraints on submitters, has at most one
value for ex:assignedTo, and that that value is an instance of schema:Person.'

"Users that have submitted an issue must also have a schema:email address,
so that they can be notified when the issue has been updated."
SHACL does not have an facilities for email notification.
Replace with 'The submitter shape requires at least one value for schema:email.'

"Each constraint defines a condition that can be validated against a graph."
Constraints do not work this way.
There needs to be some high-level description of how SHACL works before
constraints are described.

"One of the operations that SHACL engines should support validates that a given
RDF node matches a given shape."
There should not be discussion of SHACL engines here.  Instead the features
of the SHACL language should be described.
Replace with 'One of the operations of SHACL is validating a given RDF node
against a given shape.'

"This operation can be invoked based on any control logic, ...."
As this section is describing the basic part of SHACL, it
is not appropriate to talk about external control logic here.   Just remove
the entire sentence.

"Another supported operation"
Replace with "Another operation".

I have previously commented on the problems with the description of
validation and violations in this part of the document.

"The following example code"
I don't see any code here.
Replace by "The following RDF graph"

"a SHACL engine may follow"
How does ``may'' come into this at all?  This needs to be replaced by some
description of how SHACL is invoked on the control graph and the data graph.

1.3 Overview and Terminology of Advanced Features

"The validation of each constraint is formalized with one or more execution
languages."
I don't see how anyone who does not already know about SHACL can figure out
what this is saying.
Replace with something like 'Each constraint is SHACL is defined in terms of
a SPARQL 1.1 query.' and then adjust the rest of the paragraph accordingly.


2. Shapes

"A Constraint defines restrictions on the structure of an RDF graph."
I find this rather misleading.
"A Shape is a group of constraints that have the same focus nodes."
There are lots more things that go into a shape.
Replace the entire paragraph with a concise description of what shapes are
for that includes scopes and filters.

This entire section needs to be reordered and rewritten to talk about what
shapes are for and how they work  This description should describe how
shapes use constraints, scopes, and filters.  Talk about rdfs:label and
rdfs:comment, if retained, should be moved to a much less prominent place.

Talk about shapes and classes should be moved to the advanced topics
sections.

"all instances of the class are expected to have the shape"
SHACL doesn't do expectations.  All that it can do is validation.
"shape declarations can specialize the shapes associated with superclasses"
SHACL does not have any notion of specialization of shapes.
These paragraphs need to be rewritten to conform with how SHACL works.

UML diagrams should not be used as readers may not have adequate knowledge
of UML.

There are a number of SHACL vocabulary terms that show up in the section but
that are not discussed.  If a vocabulary term is important enough to show
up here, then it is important enough that it needs to be discussed here.


3. Property Constraints

I find the introduction to this section very confusing.  Focus nodes don't
have properties.  Property constraints and inverse property constraints
aren't on the objects or subjects of triples.

If property constraints have to be instances of sh:PropertyConstraint, then
blank node property constraints also have to be, so they can't be missing
the rdf:type triple.   This special treatment of blank nodes occurs in other
places as well and should be eliminated.

The use of rdfs:label for human-readable properties in context leaves the
notion of context undefined.  When should tools prefer the labels in
property constraints?  Similarly for rdfs:comment.

There is also no guidance on even when sh:defaultValue might be used by user
interface tools.

The sentence including "is assumed to be an exhaustive" is not needed and
only serves to confuse the matter.

The wording "A violation must be reported" appears in several places, but it
is not correct.  Not all constraint violations end up being reported, for
example, those in negated contexts.

What are the instances of rdfs:Datatype?  How is it determined whether
something is an instance of rdfs:Class?

sh:text should be defined as a datatype, so that it does not need any
special treatment.

The treatment of sh:valueClass just says "subclasses" in the description,
and then provides a non-standard version of subclass.   This discrepancy
needs to be prominently mentioned.

In several places properties have default value, e.g., the default value for
sh:minLength is 0.  There is no reason to even talk about such default
values as the default value is the value that has no effect.

In several places properties are mentioned as being optional, e.g.,
sh:minCount.  There is no need for such discussion, as all the properties in
the 3.1.x sections are optional.  Mentioning that some of these properties are
optional begs the question as to whether the other properties are mandatory.

There is no discussion of whether properties can be repeated.  Even if this
information is presented later, it should also be presented here,
particularly if some properties can be repeated and others cannot.

"restrict the string length of triples" needs to be changed to instead talk
about values.

There is no need to create extra classes for syntactic categories.  For
example, sh:nodeKind is unnecessary.

The textual definition for sh:valueClass is written the wrong way around.
Instead of having a mismatched rdf:type value being the triggering condition
it should be not having a matching rdf:type value.   Untyped non-literals do
not need special treatment.

Matching a shape is not a defined notion.  [I took a quick look at the
newest version of the document, and the changes do not help.]

There is no discussion on whether constraint violations inside embedded
shapes are to be handled specially.   Similar problems occur in several
other places.

Having a simple definition giving access to the correct value sets would
permit the property constraint bits to be used in both directions.


3.1.12 from version current on 28 August

"error-level constraint violations" is not defined.  The textual definition
is reversed - counting the number of values that do not validate instead of
those that do validate.

The syntax is problematic, as there might be multiple qualified value shapes.


4. Other Core Constraints on Shapes

These are not constraints on shapes, but are instead just constraints.

There is no indication that violations reported from within sh:NotConstraint
constraints are to be treated in any special way.  There is no defined
notion of a node matching a shape.

Evaluation of and and or is in order and short-circuiting.  A note should be
added that this does not conform to a recent working group resolution.

There is no definition of "error-level constraint violations".

There should be some discussion on the difference between top-level
constraints and constraints inside an and.

The general definition of XOR with multiple arguments is that it is true
precisely when an odd number of the arguments are true.

4.5 from version current on 28 August

What happens if a SHACL control file adds ignored properties to
sh:ClosedShape?


5. Scopes and Filter Shapes

The wording at the beginning of Section 5 reads like it is optional for
SHACL processors to use scopes.  There needs to be some introductory
material that discusses how SHACL validation works with scopes
and filters.

In my opinion individual scope links should point from the shape to the
node.  This would align them with class-based scopes.  It would also mean
that the property would not need to be excluded when closing shapes.

The discussion of class-based scopes needs to be clear that the definition
of instance is different from that in RDFS or OWL.

There should be a mention in 5.1.2 that there is an issue related to the
interaction of classes and shapes.

The discussion of rdf:type is incorrect.   Even for classes that are shapes
there might not be an rdf:type triple linking a resource with its shapes.
In fact, there is no real notion of the shapes of a node defined in SHACL at
all.  This wording is another attempt to make SHACL a modelling language.

Are multiple scopes conjunctive, disjunctive, or independent?

The discussion of filter shapes on shapes ("applies to all constraints")
does not match the discussion at the beginning of the section.


6. Constraint Violations Vocabulary

There is no definition of constraint validation operation or even of
constraint validation.   [The glossary in the version current as of 28
August does not define constraint validation or constraint validation
operation in any useful fashion.]

What happens if a constraint violation node has an rdf:type link to
rdfs:Resource?  According to the document it will not be a constraint
violation, but that's crazy.

There is no discussion of how constraint severity works.


[From here on in, I have done a less thorough examination.]

7. General Shape Constraints

How does "MAY reference a focus node" interact with being a constraint?

The discussion of general shape constraints based on a template does not
make any sense, as templates have not yet been adequately introduced.

In many cases there will not be a node that can be used to represent the
entire graph.   How then can graph-level constraints be defined?

8. Templates

There is no notion of SHACL instantiation defined.

If the entire core profile is templates, then say so.  If not, say so.
The wording concerning the relationship between templates and the core
profile is extremely confusing as it stands.

How are templates accessed?

SHACL doesn't have rules or stored queries.

Is rdfs:subClassOf important for templates?  If so, how?  If not, why the
restriction related to template superclasses?

[By this point, I was just trying to get to the end, and may have missed
some problems.]

11. Supported Operations

I find this section extremely difficult to understand.   Some of the
information given here needs to be mentioned much earlier.

SHACL engines MUST support SHACL operations.

All of the operations in this section are missing the control graph as an
argument.

The operations should have the data graph as an explicit argument.

There is no need to have templates arranged in a class hierarchy.

sh:NativeConstraint is not adequately defined.

There are numerous parts of the pseudo-code that don't make sense, e.g., "is
at least ?minSeverity".

All of the interface arguments are missing the data and control graphs as
arguments.  There is nothing on the programming langauge types that are to
be used.

12. Functions

There is no method provided for calling a SHACL function from within SPARQL
code.

14. SPARQL-based Execution

There is an assumption that the SHACL control graph is accessible to the
SPARQL code.  There is a working group issue related to this, which needs to
be referenced.

There is no indication that SPARQL-based execution cannot be done using a
standard SPARQL engine.

The values of sh:sparql are not strings that are syntactically valid SPARQL
queries.  (See the beginning of 14.2 - they can be fragments.  They also can
be missing prefix declarations.)

There is no notion of the defining graph in SHACL.

It is not true that sh:sparql values need not declare any prefixes.  There
could be prefixes there that do not occur elsewhere, and such prefixes would
have to be declared in the sh:sparql value.

There is no indication that executing the SPARQL queries cannot be done in a
standard SPARQL implementation.

The execution of function (and other template, I expect) bodies is not a
SHOULD relationship.  Instead it is a MUST unless the alternative produces
the same results.



Wording problems that exist in multiple places:

"SHACL RDF vocabulary"
See above for why this is wrong.
Replace by 'SHACL Language'.
Variations of this also exist, e.g., "an RDF vocabulary", "SHACL
vocabulary".

"restriction"
See above for why this is wrong.
Each occurence needs to be examined for a suitable replacement.

"subclass" and "instance"
These words are used loosely and differently in different places.  Each
place they are used needs to be examined to ensure either that a standard
meaning is being used or that the deviation from the standard is prominently
described.
Received on Wednesday, 2 September 2015 00:40:43 UTC