Re: Comments on Shapes Constraint Language (SHACL) W3C Editor's Draft 22 February 2017 from Holger Knublauch on 2017-03-03 (public-rdf-shapes@w3.org from March 2017)

From: Holger Knublauch <holger@topquadrant.com>
Date: Fri, 3 Mar 2017 16:24:49 +1000
To: public-rdf-shapes@w3.org
Message-ID: <bdc0a1f7-32c6-dbf9-4fb1-37c20012a1b9@topquadrant.com>
Hi Peter,

thank you for your continued efforts and level of detail in reviewing 
the SHACL drafts. Here is our response.


 > ** Problems in early definitions:

1)
 > "A property path is a possible route in a graph between two graph nodes."
 > Route is not defined.  Possible route is not defined.  (This wording 
is used
 > once in the SPARQL document but is not defined there.)

Yes, SPARQL document in section 9 entitled “Property Paths” defines 
property paths as follows:

"A property path is a possible route through a graph between two graph 
nodes.”

While it is true that SPARQL document does not further explain nor 
define a term “possible route”,
this potential definitional deficiency you are concerned about did not 
prevent SPARQL from reaching
Rec status, nor had it caused any implementation or usage issues. We 
believe it to be clear enough
for SHACL as well.


2)
 > "A binding is a pair (variable, RDF term), consistent with the term's 
use in
 > SPARQL."
 > If binding is taken from SPARQL just link to its definition as is 
done for
 > the other terms taken from SPARQL

This is purely an editorial preference. Here we have included both the 
link to the SPARQL document
and a local definition so that readers are not forced to follow the link 
to understand the meaning.


3)
 > "A solution is a set of bindings, one row in the body of the result 
table of
 > a SPARQL query."
 > In SPARQL a solution is not one row in the body of the result table of a
 > SPARQL query.  That is just how it is shown in some places. The actual
 > correct term is solution mapping.

The fact that it's a row in the result table is indeed not definitional 
but has been added to make
it easier to understand. I have injected the term "informally" to mark 
this sub-sentence off:

"... informally often understood as one row in the body of the result 
table of a SPARQL query."

The SPARQL spec itself almost always uses "solution" instead of 
"solution mapping", stating
"We use the term 'solution' where it is clear." (18.1.8). In SHACL we do 
the same.


4)
 > "The results table is a SolutionSequence, a list of solutions, possibly
 > unordered."
 > Results table is not used in SHACL.

Ok, redundant sentence deleted.


5)
 > "A node in an RDF graph is a SHACL instance of a SHACL class in the 
graph if
 > one of its SHACL types is the class."
 > SHACL types require reference to a graph.

Ok, reference to G added.


6)
 > "true denotes the RDF term "true"^^xsd:boolean. false denotes the RDF 
term
 > "false"^^xsd:boolean."
 > There is no notion of what denotes means here.

"denotes" is used in its common English meaning. Even without a formal 
definition,
I cannot see how it could possibly be misunderstood. Should I delete 
these two sentences to move on?


7)
 > "Target declarations are values of certain properties (such as
 > sh:targetClass) for a shape in a shapes graph."
 > More than just a value is needed.

Editorial adjustments made to clarify that such values (of course) also 
need a subject and a
predicate, (although this already follows from the definition of "value".)


8)
 > "All bindings of the variable this from the solution become focus nodes."
 > SPARQL queries do not return a solution as defined in this document.

This looks OK to me although I have switched from singular "solution" to 
"solutions" in 3 of
the 4 definitions. Could you clarify where you see a contradiction?


 > ** Unclear wording:

9)
 > "(In this document, the verbs specify or declare are sometimes used to
 > express the fact that an RDF term has property values in a graph.)"
 > As opposed to an RDF term not having any values for any property in a 
graph?

Clarified to mean "sometimes used to express the fact that an RDF term 
has values
for a given predicate in a graph"


10)
 > "A constraint component is an IRI."
 > I don't see how every IRI is a constraint component.

Why would that follow from the above sentence? If just says that if 
something is a
constraint component then it is an IRI, not also vice versa. "An apple 
is a fruit"
also doesn't say that every fruit is an apple.


11)
 > "The IRI is used, among others, in validation reports."
 > I never would have imagined that validation reports could only use 
IRIs that
 > are constraint components.

See 10. I don't understand this comment. In any case, this sentence is 
not needed
so I have taken it out.


12)
 > "SHACL-SPARQL can be used to declare additional constraint components 
based
 > on SPARQL."
 > What part of SHACL are these additional constraint components in?

Any constraint component defined with SHACL-SPARQL and not Core is not 
in Core.
The sentence above includes a link to section 6.


 > ** Strange links:

13)
 > The definition of member has a link back to itself.

Ok, link removed.


 > ** Normative wording in non-normative portions of the document:

14)
 > Section 1.6 is labelled as non-normative but discusses how to treat other
 > portions of the document and states conformance requirements for SHACL
 > implementations.

I see nothing in 1.6 that is not repeated in normative sections 5 or 6. 
Do you?


15)
 > In many places there are what are labelled as SPARQL definitions of SHACL
 > Core constraint components.  These are in non-normative portions of the
 > document.  The document needs to not make the impression that these 
bits of
 > SPARQL are definitions of portions of the SHACL Core.

Section 1.6 states:
For SHACL Core this specification uses parts of SPARQL 1.1 in 
non-normative alternative definitions of the semantics of constraint 
components and targets. While these may help some implementers, SPARQL 
is not required for the implementation of the SHACL Core language.

I think this make it clear that SPARQL definitions are non-normative 
already but anyway:
Changed from "SPARQL DEFINITION" to "POTENTIAL DEFINITION IN SPARQL"


16)
 > There are probably other places where definitional wording occurs in
 > non-normative parts of the document.  These should be removed or 
changed to
 > not give any impression that they are normative.

Comment not actionable.


 > ** SHACL Vocabulary:

17)
 > What is the status of the SHACL vocabulary?  SHACL is not an ontology 
that
 > needs a RDF graph interpreted using the RDFS semantics to provide its
 > vocabulary.  All that SHACL needs is a set of IRIs that are used in 
its RDF
 > syntax.  What then is the status of the mentioned RDF graph? How is this
 > graph to be interpreted?  Is information entailed by the graph using 
the RDF
 > or RDFS semantics have any effect on SHACL?  Is any information in 
the graph
 > part of the definition of SHACL?  Of SHACL Core?  Is all of the 
information
 > in the graph part of the definition of SHACL?  Of SHACL Core?

This topic has been discussed multiple times, see archives. It has no 
normative status whatsoever.

The role of RDFS has also been extensively discussed within the WG,
especially during the period that you were still a WG member.


 > ** Shapes:

18)
 > If any node in a shapes graph has a sh:shape link back to itself then the
 > shapes graph is recursive and behaviour of SHACL processors on the 
graph is
 > undefined even if this node is completely disconnected from the rest 
of the
 > graph.  It would be better to have the behaviour of SHACL processors 
defined
 > in cases like this.

Such an optimization could be handled by future working groups or 
community groups.
Since the treatment of recursion is undefined, implementations already 
have the option
to ignore such cases.

BTW, sh:shape has been renamed to sh:node.


19)
 > It used to be that top-level shapes were conventionally indicated by an
 > rdf:type link to sh:Shape.  This has apparently changed to sh:NodeShape.
 > There is no apparent reason for this change, which should be changed 
back.

This was a topic of separate emails and discussions and already 
responded to.


20)
 > sh:Shape does not appear to have any effect at all in SHACL. If this 
is the
 > case then it should be removed.

This was a topic of separate emails and discussions and already 
responded to.


21)
 > The syntax rule for shapes doesn't appear to be a syntax rule at all.
 > Instead it is defining what a shape is.

They are marked as syntax rules because other syntax rules depend on them.


22)
 > "A shape in a shapes graph declares a constraint of kind c if c is a
 > constraint component and the shape has values for all mandatory 
parameters
 > of c. The constraint declaration consists of the values that the 
shape has
 > for all mandatory and optional parameters of that component."
 > The word "declares" is only harmful here.
 > For constraint components that have more than one parameter this 
definition
 > loses which value is connected to which parameter.  For shapes that 
have two
 > values for a single-parameter constraint component there is only one
 > resultant constraint and that constraint has both of the values of the
 > parameter.

The sentences that you quote above cannot be taken out of context. Other 
sentence
explain what happens in the multiple-values case. I don't see how to 
keep the
definition readable while cramping all this information into it.
If you have specific suggestions, feel free to send them along.


23)
 > "Note that the definition above does not include all of the syntax 
rules of
 > well-formed shapes."
 > There is no notion of well-formed shapes introduced in the document, even
 > though it is used in several places.  Similarly there is no notion of
 > well-formed property shape or well-formed node shape introduced in the
 > document even though both of these are used in the document.

Shapes are nodes, therefore the definition of well-formed nodes apply to 
shapes too.


24)
 > "Note that the definitions of well-formed property shapes and node shapes
 > make these two sets of nodes disjoint."
 > There is no definition of either well-formed property shape or 
well-formed
 > node shape.

Shapes are nodes, therefore the definition of well-formed nodes apply to 
shapes too.


 > ** Property Paths:

25)
 > "A node in an RDF graph is a well-formed SHACL property path p if it
 > satisfies exactly one of the syntax rules in the following 
sub-sections. A
 > node p is not a well-formed SHACL property path if p is a blank node 
and any
 > path mappings of p directly or transitively reference p."
 > It is possible that a node could both satisfy exactly one of the syntax
 > rules and also refer back to itself.  What happens then?
 > Every path mapping of p references p so all blank nodes are not 
well-formed
 > SHACL property paths.

The second sentence

"A node p is not a well-formed SHACL property path if p is a blank node and
any path mappings of p directly or transitively reference p."

This IMHO excludes the scenario of self-reference.

I don't understand your last sentence. An example would help.


26)
 > "A sequence path is a blank node that is a SHACL list with at least two
 > members and each member is a well-formed SHACL property path."
 > Sequence paths can have extra information associated them.

I don't understand this comment. An example would help.
Note that in the paragraph above the individual path syntax sections, we 
state
"... if it satisfies exactly one of the syntax rules in the following 
sub-sections."
and this IMHO excludes the case that a sequence path may have other 
triples such as
a sh:inversePath.


27)
 > "An alternative path is a blank node that is the subject of exactly one
 > triple in G."  "An inverse path is a blank node that is the subject of
 > exactly one triple in G."  And so on.
 > These paths can't have extra information associated with them. Any blank
 > node that is the subject of exactly one triple is lots of kinds of paths.

I don't understand this comment. An example would help.


 > ** Non-validating information

28)
 > What happens if the requirements in the non-normative 2.3.2 are violated?

This section does not contain formal syntax rules so there is no impact.


 > ** Targets:

29)
 > "If s is a SHACL instance of sh:NodeShape or sh:PropertyShape in a shapes
 > graph SG and s is also a SHACL instance of rdfs:Class in SG then the 
set of
 > SHACL instances of s in a data graph DG is a target from DG for s in SG."
 > So a node that is a SHACL instance of sh:Shape and a SHACL instance of
 > rdfs:Class will not produce an implicit class target.  This is going to
 > trip up a lot of people and needs to be changed.

I disagree that it "needs to be changed". This may be your (speculative) 
personal opinion.
sh:Shape will barely show up in any SHACL file and does not show up in 
our own examples.
While earlier versions of SHACL used the IRI sh:Shape to mean something 
else this doesn't
imply that future readers will use it for its old meaning. People will 
get used to
sh:NodeShape just like they get used to our renaming of 
sh:PropertyConstraint to sh:PropertyShape.


 > ** Validation:

30)
 > "Conformance checking is a simplified version of validation, producing a
 > boolean result."
 > There is no definition of which boolean value conformance checking 
produces.

The sentence you quote above has a hyperlink to the exact definition of 
which values to produce.


31)
 > "the validation process"
 > What counts as part of the validation process?  Does checking for
 > ill-formedness?  Does entailment?  Does piecing together the shapes 
graph or
 > the data graph?

The document already clarifies that checking for ill-formedness is not 
required,
neither is entailment. Also the document already clarifies that shapes 
and data graphs are
immutable, i.e. they are assembled before the validation process.


32)
 > "For example, SHACL processors may support recursion scenarios or 
produce an
 > error when they detect recursion."
 > I expect that this should be failure instead of error.

Ok, replaced "error" with "failure".


33)
 > "A shapes graph is an RDF graph containing zero or more shapes that is
 > passed into a SHACL validation process so that a data graph can be 
validated
 > against the shapes."
 > Is the graph that is used when a node in a graph is validated against a
 > shape in it a shapes graph?

Yes, "shapes graph" is a role.


34)
 > "Every value of sh:shapesGraph is an IRI representing a graph that 
should be
 > included into the shapes graph used to validate the data graph."
 > Shouldn't this be SHOULD?  Or should it be MAY as is used in the next
 > sentence?

Ok, changed from should to SHOULD. "may" changed to SHOULD, too.


35)
 > "Validating an RDF term against a shape involves validating the term 
against
 > each of the components for which the shape has values for all mandatory
 > parameters, using the validators associated with the respective 
component."
 > This incorrectly uses constraint components instead of constraints.

Ok, clarified to refer to the constraint and then the component of the 
constraint.


36)
 > "The validation of a focus node in the data graph against a constraint in
 > the shapes graph produces the top-level validation results that are 
produced
 > by the validator of the constraint component, using as input the 
focus node,
 > the specific values of the parameters in the constraint, and the 
value nodes
 > of the shape in the data graph."
 > SHACL constraint components have multiple validators.
 > Validation needs more than just the information listed above.
 > Shapes don't have value nodes at all.
 > The output of a validation process is not completely defined, so 
there is no
 > notion of *the* results of validation.

Has been changed to:

Validation of a focus node against a shape: Given a focus node in the 
data graph and a shape in the shapes graph, the validation results are 
the union of the results of the validation of the focus node against all 
constraints declared by the shape, unless the shape has been 
deactivated, in which case the validation results are empty.

Validation of a focus node against a constraint: Given a focus node in 
the data graph and a constraint of kind C in the shapes graph, the 
validation results are defined by the validators of the constraint 
component C. These validators typically take as input the focus node, 
the specific values of the parameters of C of the constraint in the 
shapes graph, and the value nodes of the shape that declares the constraint.

Value nodes are defined as follows:

The validators of most constraint components use the concept of value 
nodes, which is defined as follows:

For node shapes the value nodes are the individual focus nodes, forming 
a set with exactly one member.
For property shapes with a value for sh:path p the value nodes are the 
set of nodes in the data graph that can be reached from the focus node 
with the path mapping of p. Unless stated otherwise, the value of 
sh:resultPath of each validation result is a SHACL property path that 
represents an equivalent path to the one provided in the shape.


37)
 > "A focus node conforms to a shape if and only if the validation of 
the shape
 > does not produce any validation result or a failure."
 > So no focus node will conform to a node shape that has a sh:not 
parameter.

The definition of validation has been modified to be a "mapping" instead 
of a process that "produces" something. All usages of "a validation 
result MUST be produced" have been replaced with "there is a validation 
result". The usages of "nested" validation has been replaced with 
"conformance checking". Prose has been added to make it clear that the 
results of conformance checking do not end up in the surrounding report.


 > ** Conformance:

38)
 > "All SHACL implementations MUST at least cover the Core."
 > Covering is not defined.

Ok, replaced with "implement".


39)
 > "This specification describes conformance criteria for: SHACL Core [...],
 > SHACL-SPARQL [...], SHACL Shapes Graphs [...], Validation of a data graph
 > against a shapes graph [...], Validation of an RDF term from a data graph
 > against a shape from the shapes graph [...], SHACL Core processors [...],
 > SHACL-SPARQL processors [...]."
 > Conformance is something that is required of implementations. It doesn't
 > make sense for special kinds of graphs or processes to conform.  Instead
 > these are defined as something.

Ok, I have reorganized this to move the first two bullet items into the 
introductory
paragraph and removing the 3 bullet items on processes. This leaves only 
the two
items about "processors" under conformance.


40)
 > An RDF graph is a (mathematical) set of RDF triples.  There are no
 > operations that can modify an RDF graph defined in RDF.

Where does this comment apply to?


 > ** Core Constraint Components:

41)
 > "This section defines the built-in SHACL Core constraint components that
 > MUST be supported by all SHACL Core processors."
 > Are SPARQL-SHACL processors required to support these components as well?

Yes, we already state that all SHACL processors must support Core.


42)
 > "The SPARQL definitions in this section represent potential validators."
 > The SPARQL queries are in informative parts of the document and can't be
 > considered to be definitions.  As many of the SPARQL
 > definitions currently have problems, it would be better to just 
remove all
 > the SPARQL query stuff from this section.

See 15)

You state that "many of the SPARQL definitions currently have problems". 
Which?


43)
 > "The following constraint components represent restrictions on the 
number of
 > value nodes."
 > Presumably the number of value nodes for a particular focus node, not the
 > number of value nodes overall.

Sure, I assume readers can understand that. Anyway, added "for the given 
focus node".


44)
 > "If this parameter is omitted then there is no limit on the number of
 > triples."
 > Which triples?  How can this parameter be omitted and there still be a
 > constraint set up for it?

You may remember that this sentence was deemed important by certain WG 
members
(I think Karen). In any case, I agree it doesn't add formal value
so I have taken it out to proceed.


45)
 > "The values of sh:pattern are literals with datatype xsd:string that are
 > valid pattern arguments for the SPARQL REGEX function."
 > Having this as a syntax condition means that
 > checking for correct operation of SHACL processors will need to be 
aware of
 > whether a value for sh:pattern is a valid pattern argument for the SPARQL
 > REGEX function.

SHACL Core implementations require a regular expression engine anyway.
In the worst case this test can thus be implemented by sending a dummy 
regex to the
processor and catch exceptions. Usually the underlying libraries have 
better syntax
checking APIs built in.  Leaving this rule out will raise follow-up issues
such as what happens if the pattern is invalid syntax.
Overall I think this is currently acceptable, although I agree that it's 
not a
rule that could be expressed, say, in SHACL Core itself.


46)
 > "If $flags has a value then it MUST be interpreted according to the third
 > argument of the SPARQL REGEX function."
 > What does the mean?  I'm guessing that the constraint component is 
supposed
 > to act as if this value is the third argument, not that this value is
 > interpreted as anything.

I have changed the wording, although I believe there was not really a 
chance to
misinterpret the old wording either.


47)
 > "For each pair of value nodes and the values of the property $lessThan at
 > the given focus node where the first value is not less than the 
second value
 > (based on SPARQL's < operator) or where the two values cannot be 
compared, a
 > validation result MUST be produced with the value node as sh:value."
 > How many different validation results need to be produced if the set of
 > value nodes is the set { 1, 2 } and the set of property values is the set
 > { "a", "b" }?

Four (one for each combination). While adding a test case for this, I 
noticed a glitch
in the informal, potential SPARQL definition, which is now fixed.


48)
 > "For each value node that produces no validation results against the 
shape
 > $not"
 > So if there is a conjunction under the not, a validation result will be
 > produced here if just one of the conjuncts produces a validation result.

Switched to using the term "conforms", which IMHO clarifies this issue.


49)
 > "For each value node where the validation of the value node against 
any of
 > the members of $and produces a validation result and no failure, a
 > validation result MUST be produced with the value node as sh:value."
 > So if there is a negation under any of the conjunctions a validation 
result
 > will always be produced.

Also switched to "conforms", similar for sh:or, sh:xone, sh:node, 
sh:qualifiedValueShape

I believe this improves the situation towards your parallel email thread 
on whether nested
validation processes MUST create validation results: it is now more 
explicit that we
just need conformance checking, not the full report.


50)
 > "If a value node is violating the constraint, sh:shape will produce 
only a
 > single validation result, with sh:ShapeConstraintComponent as its
 > sh:sourceConstraintComponent."
 > Not correct.  The sh:shape could produce other validation results for 
other
 > value nodes.

Clarified to be "for this value node".


51)
 > "On the other hand side, sh:property may produce any number of validation
 > results, and these will have the individual constraint components of the
 > property shape as their values of sh:sourceConstraintComponent."
 > Not correct.  The validation results produced here may have other 
values for
 > sh:sourceConstraintComponent.

Clarified to be about "the constraints in the property shape".


 > ** Specify and  Declare:


52)
 > Specify and declare are use throughout the document in strange ways.  It
 > would be better to replace these with words that do not have so much
 > baggage, e.g., "For example, shapes can state" or "A shape in a 
shapes graph
 > has a constraint of".

I remember long discussions about this terminology within the WG last year.
I don't think it would be fruitful to reopen these discussions as we are 
trying
to stabilize the document.  Making such general changes may introduce 
new issues
while making previous reviews invalid.  We have to mind the March CR 
deadline.


 > ** SHACL-SPARQL:

53)
 > All aspects of SHACL-SPARQL depend heavily on pre-binding.  As 
pre-binding
 > has never had a workable definition in SHACL there is no purpose in 
closely
 > reviewing this part of the document at this time.  Either this part 
of the
 > document needs to be removed or a workable definition of SHACL 
provided and
 > this phase of the W3C process repeated.

In its meeting on March 2, 2017 the RDF Data Shapes Working Group has 
decided to mark SHACL-SPARQL (i.e. sections 5 and 6) as features at risk.
The Working Group would like to collect implementation feedback on the 
current definition of pre-binding which is central to SHACL-SPARQL. 
Marking this as a feature at risk provides the WG with more flexibility 
with regards to the recommendation track process in case technical 
issues with the current definition of pre-binding are reported.


54)
 > It is not clearly stated what SHACL Core processors need to do when they
 > encounter constructs from SHACL-SPARQL.  It might be the case that SHACL
 > Core processors can completely ignore SHACL-SPARQL constructs, 
because SHACL
 > Core processors only "cover" SHACL Core, but it might also be the 
case that
 > SHACL Core processors need to examine SHACL-SPARQL constructs, 
because these
 > constructs might cause some nodes to be ill-formed.

What consequence would that have? Syntax checking is not required either 
way.


55)
 > Both sh:prefixes and sh:prefix are used.  I expect that only one is 
needed.

They fulfill very different roles. sh:prefix is for an individual prefix 
(pair),
while sh:prefixes links a query with a set of prefixes.


Commits addressing these are onwards from

https://github.com/w3c/data-shapes/commit/28f40d46efe714ebe9f1909f82e3fd84172dc447

Regards,
Holger


On 27/02/2017 12:13, Holger Knublauch wrote:
> I have raised https://www.w3.org/2014/data-shapes/track/issues/234 to 
> track our response.
>
> We will likely respond after the next WG meeting this week.
>
> Holger
>
Received on Friday, 3 March 2017 07:26:10 UTC