Re: Quick Comments on https://www.w3.org/TR/2016/WD-shacl-20160814/ from Peter F. Patel-Schneider on 2016-09-23 (public-rdf-shapes@w3.org from September 2016)

From: Peter F. Patel-Schneider <pfpschneider@gmail.com>
Date: Thu, 22 Sep 2016 18:36:42 -0700
To: Holger Knublauch <holger@topquadrant.com>, public-rdf-shapes@w3.org
Message-ID: <f0ea90d7-7633-5ebe-03f0-9dc6a95518a2@gmail.com>
Responses in line.

peter



> pre-binding
>
> SPARQL does not evaluate variables that occur in basic graph patterns.
This means that the definition of pre-binding has unusual behaviour. For
example, the normative SPARQL definition of sh:class will return validation
results for every pair of nodes in the graph such that there is an
rdf:type/rdfs:subClass* path from the first to the second.
>
> This problem affects many parts of the definition of SHACL. It means that
the normative definition of many SHACL constructs is counter to intuitions.
This problem is not ameliorated by the caution box in Appendix B.
>
> Comment (HK): WG is waiting for input from the SPARQL EXISTS CG on this
topic.

The current definition of pre-binding in the 22 September 2016 Editors'
Draft is broken.  Following the description of pre-binding in
http://w3c.github.io/data-shapes/shacl/#pre-binding results in something
that does not serve any useful purpose.  The working group needs to address
this problem and cannot count on the SPARQL Maintentance (EXISTS) Community
Group producing anything relevant as pre-binding is not part of their
charter.

> syntax of SPARQL variables
>
> SPARQL treats $ and ? as equivalent so $PATH and ?PATH both refer to the
PATH variable. SHACL uses $ as a special marker and includes $ and ? as part
of the variable.
>
> Would ?PATH be substituted as $PATH is? If a SPARQL query for a SHACL
constraint only used ?this would the variable this be pre-bound?
>
>     Comment (HK): I have tried to address this here
(https://github.com/w3c/data-shapes/commit/4871ced946aa03cd2bd91d808d8e4a1b33e64ef6)
so that the text no longer refers to things like $PATH as a variable, but
instead to PATH.

It looks as if the commit has largely addressed this editorial issue.  I
have not checked that all vestiges of the problem have been eliminated.

> pre-binding optional?
>
> "SPARQL variables using the $ marker represent external values that must
be pre-bound or substituted in the SPARQL query before execution." "When
SPARQL constraints are executed, the validation engine should pre-bind
values for these variables." Are some $-marked variables not necessarily
pre-bound, counter to the earlier requirement?
>
>     Comment (HK): The "should" was indeed a mistake, it's not optional.
Removed:
https://github.com/w3c/data-shapes/commit/ecdad602d5d4bfeb3a2a876298349fe69d0c4e60

The commit has addressed this editorial issue.

> $PATH vs other $-prefixed variables
>
> The variable PATH is treated specially in SHACL. However, the general
description of $ does not specially call out PATH: "SPARQL variables using
the $ marker represent external values that must be pre-bound or substituted
in the SPARQL query before execution."
>
>     Comment (HK): Addressed here, pointing out the special treatment of
PATH:
https://github.com/w3c/data-shapes/commit/a5db1204433b19a0da099a8a89af76186d865f6c

The commit introduces the distinction.  The current version of Appendix C
describes the special treatment of PATH.

> $value
>
> $value is used in many ASK queries. However the definition of ASK
validators does not appear to pre-bind value.
>
>     Comment (HK): 4.1 states "These queries are interpreted against each
value node, bound to the variable value." A similar statement exists in
section 6.4.2. So I am not sure what is missing here.

Nothing.  My mistake.

> aggregation
>
> The prohibition "Furthermore, any query that uses the variable $this in an
aggregation is invalid." is vague. It appears to disallow the use of $this
in any part of the SPARQL 1.1 aggregation machinery, as the pointer in the
sentence is to Section 11 of the SPARQL specification. This would rule out
all of the examples of aggregation in the SHACL document.
>
>     Comment (HK): I have tried to clarify that this is only about the use
of ?this in expressions. This is allowing its use in GROUP BY, in case you
were referring to this. Apart from that I don't see uses of ?this in
aggregations in the SHACL document.
https://github.com/w3c/data-shapes/commit/0c6939ba95ffd6c7fee2285a3638c144a97f8528

GROUP BY is part of aggregation.  There were four examples of GROUP BY ?this
which the mentioned wording appears to prohibit.  The current wording is no
better.  "[T]he expression used in an aggregation" is an incorrect
description as there can be multiple expressions in the aggregation portion
of a query.  The argument to GROUP BY itself is just as much an expression
as the argument to HAVING.

This situation is indicative of the sloppiness of the current specification.
SPARQL has a complicated grammar.  The argument to GROUP BY is a sequence of
GroupCondition; the argument to HAVING is a sequence of HavingCondition.
Using general words, like "expression", to describe bits of the SPARQL
grammar is generally incorrect.  Where specific bits of SPARQL are used in
the SHACL definition they need to be described as they are in the SPARQL
definition.

> ASK validators syntax
>
> The syntax for ASK queries in SPARQL 1.1 is
>
>  "ASK" DatasetClause* WhereClause SolutionModifier
>
> The syntax for WhereClause is
>
>  'WHERE'? GroupGraphPattern
>
> The syntax for EXISTS constructs SPARQL 1.1 is
>
>  'EXISTS' GroupGraphPattern
>
> Stripping the ASK from the beginning of an ASK query does not generally
end up with a GroupGraphPattern that can be used as the argument for EXISTS.
>
>     Comment (HK): Thanks for pointing out this detail. I have tried to
address this with:
https://github.com/w3c/data-shapes/commit/d820e0bac287944fb13edc86040995927f02e20d

> It appears that the values of sh:ask are never used as ASK queries by
SHACL processors. Why then are these of the form of ASK queries?
>
>     Comment (HK): While in theory we could have stated GroupGraphPattern,
I think ASK is more intuitive to explain and allows stand-alone execution
with copy and paste. Furthermore they align with the use of functions, which
can also have ASK queries as their bodies.

The syntax of ASK queries doesn't match the syntax required for EXISTS and
is thus unsuitable here.  The conniptions required to get them to sort of
match show just how unsuitable this is.

> different levels of SHACL implementation
>
> There are several different kinds of SHACL implementations that are hinted
at in the document.
>
> "SHACL implementations may, but are not required to, support entailment
regimes." "Access to the shapes graph is not a requirement for supporting
the SHACL Core language." "This sections [sic] defines the built-in SHACL
constraint components that MUST be supported by all SHACL validation
engines." "Not all SHACL validation engines need to support this variable."
"The same support policies as for $shapesGraph apply for this variable."
"SPARQL engines with full SHACL support can install a new SPARQL function
based on the SPARQL 1.1 Extensible Value Testing mechanism." "SHACL
validation engines are not required to support any entailment regimes."
"SHACL implementations with full support of the SHACL SPARQL extension
mechanism must implement a function sh:hasShape, ...." "A SHACL validation
engine MUST implement all constructs in the Core of SHACL (Sections 2, 3,
4). A SHACL engine MAY not implement the other parts of SHACL."
"Implementations that cover only the the SHACL Core features are not
required to implement these mechanisms or the sh:hasShape function." "SHACL
validation engines MAY pre-bind the variable $shapesGraph to provide access
to the shapes graph." "A SHACL validation engine MAY use such suggestions to
determine which shapes graph to use for validating a data graph." "A SHACL
validation engine MAY take this information into account to determine which
shapes graph to use for validating a data graph that uses that ontology or
vocabulary."
>
> There needs to be a section that explicitly defines the different levels
of implementation.
>
>     Comment (HK): Not sure what to do about this. There is an almost
infinite amount of combinations of these above, so one could define many
dialects. But only one of them is the full SHACL. I would prefer all SHACL
engines to support all these features but there was too much resistance,
e.g. from those favoring a single-query-code-generation approach or working
against SPARQL end points. The resulting mess is reflecting the
heterogeneous nature of the SPARQL universe, whether we want it or not.
>     Comment (DK): What if we created a section at the end of part II
called "Optional features of the SHACL SPARQL extension mechanism" (or
something similar) where we list all option features
>     Comment (HK): Ok, I have added an appendix with the goal of
enumerating all optional features. Could you double check this:
https://github.com/w3c/data-shapes/commit/e198bc9689c95e89e8caeb8c3c787b9efa579856

This does not appear to address my concerns.  How many different levels of
SHACL implementation are there?  For examples, can a SHACL implementation
implement SPARQL-based constraints but not access to the shapes graph, or
some other random set of the optional parts of SHACL?

> order of processing for filters
>
> The discussion of how filters are processed appears to be contradictory.
First there is: "SHACL validation engines MAY alter the order of the
depicted steps as long as the returned validation results are correct."
Later there is: "Filter shapes MUST be evaluated before validating the
associated shapes or constraints."
>
>     Comment (HK): Yes, the first sentence is IMHO incorrect and I have
taken it out
(https://github.com/w3c/data-shapes/commit/3777e8e80aec9f9c1ba1bbb0dfdfce2b2acb9a12).
The problem is that if an engine does filtering after validation, it may run
into a failure that is otherwise not reached. I don't remember why we added
that statement in the first place, do you @Dimitris?
>     Comment (DK): This was changed to address a comment from Peter on
March 7th and resulted in this commit

This appears to be two different responses.  What is the situation?

> $shapesGraph
>
> The status of $shapesGraph is unclear: "SPARQL variables using the $
marker represent external values that must be pre-bound or substituted in
the SPARQL query before execution." "SHACL validation engines MAY pre-bind
the variable $shapesGraph to provide access to the shapes graph."
>
>     Comment (HK): The MAY is clarified in the following sentence (Access
to the shapes graph is not a requirement etc). I believe it would be
confusing to soften up the must in the first sentence because of this
exception.

It remains that there are two controlling wordings for how to handle
$shapesGraph, one with a must (which probably should be MUST) and one with
MAY.  These appear to be contradictory.

> circular filters
>
> What happens if a shape is one of its own filters?
>
>     Comment (HK): The same as with other recursive scenarios - it's
undefined.

OK.

> EXISTS and blank nodes
>
> The definition of ASK binds the value variable and then uses it inside an
EXISTS. The definition of SPARQL provides a counter-intuitive result if this
variable is bound to a blank node, resulting in, for example, a sh:class
constraint with class ex:C returning no violation for _:d in any data graph
containing the triple
>
>  ex:c rdf:type ex:C .
>
>     Comment (HK): We are awaiting input from the SPARQL Maintenance
(EXISTS) community group.

The document needs to mention where the problems with EXIST currently affect
SHACL.

> union operations on data graphs and shapes graphs
>
> It is unclear just what the data graph and the shapes graph are. There is
wording that both of these cannot be changed. However, there is also wording
that various kinds of union operations are to be performed on shapes and
data graphs.
>
>     Comment (HK): The only place I could find "union" was about handling
of owl:imports, which states that the result of this union is used as shapes
graph. This looks OK to me. Could you clarify what you mean?
>     Comment (DK): I tried to make the wording clearer here:
https://github.com/w3c/data-shapes/commit/b6fd2db5719cc9c9bdec464acdd2aefc8d0b5b68

I don't find this much better.  If the shapes graph and the data graph
cannot be changed then there should not be wording about unioning,
extending, or otherwise modifying the shapes graph or the data graph.

> $targetNode
>
> It is unclear what is meant by: "The variable $targetNode is assumed to be
pre-bound to the given value of sh:targetNode." Is this something that SHACL
implementations have to do? There are several occurences of this kind of
wording.
>
>     Comment (HK): I don't see anything wrong here. "is assumed to" is IMHO
OK because this section is merely describing the formal semantics without
prescribing an implementation. Implementations will (almost certainly) not
use a SPARQL query.

The use of words like "assumed", particularly with no modifiers, is
generally problematic in specifications.  It certainly is problematic here.
Instead of assuming that something is a particular way definitions should
required that instead.

> MAY
>
> MAY is used in 1.5 but defined in 1.6
>
>     Comment (HK): Ok, moved higher up
https://github.com/w3c/data-shapes/commit/bda4e2c4781494ac0e26eb132c7e7dae15932739

OK

> MAY 2
>
> "A SHACL engine MAY not implement the other parts of SHACL." reads as if
no SHACL engine is allowed to implement any non-core part of SHACL.
>
>     Comment (HK): See
https://github.com/w3c/data-shapes/commit/2ba049e6e39096bf47355b03d1de02c2e0e84f59

Better.

> Graphs SHOULD
>
> "The data graph SHOULD include all the ontology axioms related to the data
and especially all the rdfs:subClassOf triples in order for SHACL to
correctly identify class targets and validate Core SHACL constraints." Data
graphs are just graphs. How thus can SHOULD be applied to them?
>
>     Comment (HK): I have replaced the SHOULD with "is expected to":
https://github.com/w3c/data-shapes/commit/fd3fbeac7826f9df87111af878e65e34a502331c

Better.

> Suggestions
>
> "A SHACL validation engine MAY use such suggestions to determine which
shapes graph to use for validating a data graph." Can this be done even when
an explicit shapes graph is provided to the engine?
>
>     Comment (HK): Attempted to clarify at
https://github.com/w3c/data-shapes/commit/601631a5f4b965fa79f7b44a5a348702326ef315

Better, but retains the issue of changing the unchangeable.

> Different shapes graph
>
> "The same mechanism applies for ontologies or vocabularies included in the
shapes graph. The ontology or the vocabulary IRI can point to one or more
shapes graphs with the predicate sh:shapesGraph. A SHACL validation engine
MAY take this information into account to determine which shapes graph to
use for validating a data graph that uses that ontology or vocabulary." If
there already is a shapes graph in play, why is there any need for a
different shapes graph to be used?
>
>     Comment (HK): I have changed the prose to clarify that sh:shapesGraph
only points at graphs, not shape graphs:
https://github.com/w3c/data-shapes/commit/c88df2cf50cbc5f31feaabf610a0143d3ebcf0fb
>     Comment (DK): I removed the "in the shapes graph" here. This was meant
as a general property for ontology design not only when it is used in one of
the shapes/data graph

But MAY SHACL implementations do this when they are explicitly given a
shapes graph?

> Deep copy
>
> "a deep copy of sh:path as its sh:path" What is "deep copy" in this
context?
>
>     Comment (HK): I have attempted to clarify this here:
https://github.com/w3c/data-shapes/commit/d3f8f858f95b010d1f2a0e4681da203bcbfbc217
>     Comment (kc): Unless "deep copy" has some pre-defined meaning that I
am unaware of, I would suggest dropping it and saying: The value of sh:path
of each validation result must copy all triples that are required by the <a
href="#path-syntax">SHACL well-formed path syntax rules</a>from the
<a>shapes graph</a> into the graph containing the validation results.
>     Comment (HK): The first google match of "deep copy" is pretty close to
what I wanted to express, so I believe the term should be familiar to many
people and may be helpful for implementers. Also I had surrounded the term
with "...". Anyway, I have no strong opinion and let others decide.

The extra wording is helpful.  However, "deep copy" in
https://en.wikipedia.org/wiki/Object_copying#Deep_copy is different.  Either
drop "deep copy" or point to an appropriate definition.

> Filter role
>
> "A filter is a shape in a shapes graph that can be used to limit the nodes
that are validated against a given constraint or shape." Are there some
filters that cannot be used in this way? Which ones?
>
>     Comment (HK): I don't understand this comment. The current statement
does not exclude any filters from being used this way.
>     Comment (DK): This commit should fix this issue.

Better.

> Incomplete table
>
> "The following table enumerates variables that have special meaning in
SPARQL constraints. When SPARQL constraints are executed, the validation
engine should pre-bind values for these variables." However, many other
variables also need to be pre-bound, such as the variables corresponding to
parameters.
>
>     Comment (HK): First, the statement above does not exclude other
variables from being pre-bound. It doesn't claim that the table contains
"all" variables. Second, this is in a chapter about SPARQL Constraints,
where parameters have no meaning. So I don't think anything is wrong here.
>     Comment (DK): I think this commit helps more with this issue. I am not
sure if we should move that table in the prebinding section since it affectd
prebinding as a whole, not only SPARQL constraints

Reading Section 5.3 still gives me the feeling that there is an implicit
completeness consideration here.   There are many other pre-bound variables.
There are many other variables with special meaning.  There should be clear
wording to the effect that these are only three of the special pre-bound
variables in SHACL.
Received on Friday, 23 September 2016 01:37:24 UTC