Re: Comments on Shapes Constraint Language (SHACL) W3C Editor's Draft 22 February 2017 from Holger Knublauch on 2017-02-27 (public-rdf-shapes@w3.org from February 2017)

From: Holger Knublauch <holger@topquadrant.com>
Date: Mon, 27 Feb 2017 12:13:31 +1000
To: public-rdf-shapes@w3.org
Message-ID: <7ee15f3a-6318-e00f-fbe6-d8c3aaffc32e@topquadrant.com>
I have raised https://www.w3.org/2014/data-shapes/track/issues/234 to 
track our response.

We will likely respond after the next WG meeting this week.

Holger


On 26/02/2017 1:03, Peter F. Patel-Schneider wrote:
> Here are some comments on this document.  In summary, there are still lots
> of significant problems.  Addressing these problems will result in
> substantial changes to this document.
>
> I have not examined the portions of the document labelled as non-normative
> as closely as the normative sections.  I have treated all portions of the
> document labelled as non-normative as if they do not contain any normative
> content.
>
> Some of the comments below are about problems that occur in multiple places
> in the document.  I have not always listed all of these places.
>
> It was hard to decipher the odd wording in many places to get to the
> underlying meaning.  The document needs a rewrite to state the definition of
> SHACL in a consistent manner.  Part of the problem here, but only part, is
> the mixing of different and phrases that can carry definitional import,
> including "if ... then", "declared", "specified", and "is".  Another part of
> the problem is the use of "MUST" in the definitions of constraint
> components.
>
> This set of comments is separate from my previous comments on SHACL.
>
>
> ** Problems in early definitions:
>
> "A property path is a possible route in a graph between two graph nodes."
> Route is not defined.  Possible route is not defined.  (This wording is used
> once in the SPARQL document but is not defined there.)
>
> "A binding is a pair (variable, RDF term), consistent with the term's use in
> SPARQL."
> If binding is taken from SPARQL just link to its definition as is done for
> the other terms taken from SPARQL
>
> "A solution is a set of bindings, one row in the body of the result table of
> a SPARQL query."
> In SPARQL a solution is not one row in the body of the result table of a
> SPARQL query.  That is just how it is shown in some places.  The actual
> correct term is solution mapping.
>
> "The results table is a SolutionSequence, a list of solutions, possibly
> unordered."
> Results table is not used in SHACL.
>
> "A node in an RDF graph is a SHACL instance of a SHACL class in the graph if
> one of its SHACL types is the class."
> SHACL types require reference to a graph.
>
> "true denotes the RDF term "true"^^xsd:boolean. false denotes the RDF term
> "false"^^xsd:boolean."
> There is no notion of what denotes means here.
>
> "Target declarations are values of certain properties (such as
> sh:targetClass) for a shape in a shapes graph."
> More than just a value is needed.
>
> "All bindings of the variable this from the solution become focus nodes."
> SPARQL queries do not return a solution as defined in this document.
>
>
> ** Unclear wording:
>
> "(In this document, the verbs specify or declare are sometimes used to
> express the fact that an RDF term has property values in a graph.)"
> As opposed to an RDF term not having any values for any property in a graph?
>
> "A constraint component is an IRI."
> I don't see how every IRI is a constraint component.
>
> "The IRI is used, among others, in validation reports."
> I never would have imagined that validation reports could only use IRIs that
> are constraint components.
>
> "SHACL-SPARQL can be used to declare additional constraint components based
> on SPARQL."
> What part of SHACL are these additional constraint components in?
>
>
> ** Strange links:
>
> The definition of member has a link back to itself.
>
>
> ** Normative wording in non-normative portions of the document:
>
> Section 1.6 is labelled as non-normative but discusses how to treat other
> portions of the document and states conformance requirements for SHACL
> implementations.
>
> In many places there are what are labelled as SPARQL definitions of SHACL
> Core constraint components.  These are in non-normative portions of the
> document.  The document needs to not make the impression that these bits of
> SPARQL are definitions of portions of the SHACL Core.
>
> There are probably other places where definitional wording occurs in
> non-normative parts of the document.  These should be removed or changed to
> not give any impression that they are normative.
>
>
> ** SHACL Vocabulary:
>
> What is the status of the SHACL vocabulary?  SHACL is not an ontology that
> needs a RDF graph interpreted using the RDFS semantics to provide its
> vocabulary.  All that SHACL needs is a set of IRIs that are used in its RDF
> syntax.  What then is the status of the mentioned RDF graph?  How is this
> graph to be interpreted?  Is information entailed by the graph using the RDF
> or RDFS semantics have any effect on SHACL?  Is any information in the graph
> part of the definition of SHACL?  Of SHACL Core?  Is all of the information
> in the graph part of the definition of SHACL?  Of SHACL Core?
>
>
> ** Shapes:
>
> If any node in a shapes graph has a sh:shape link back to itself then the
> shapes graph is recursive and behaviour of SHACL processors on the graph is
> undefined even if this node is completely disconnected from the rest of the
> graph.  It would be better to have the behaviour of SHACL processors defined
> in cases like this.
>
> It used to be that top-level shapes were conventionally indicated by an
> rdf:type link to sh:Shape.  This has apparently changed to sh:NodeShape.
> There is no apparent reason for this change, which should be changed back.
>
> sh:Shape does not appear to have any effect at all in SHACL.  If this is the
> case then it should be removed.
>
> The syntax rule for shapes doesn't appear to be a syntax rule at all.
> Instead it is defining what a shape is.
>
> "A shape in a shapes graph declares a constraint of kind c if c is a
> constraint component and the shape has values for all mandatory parameters
> of c. The constraint declaration consists of the values that the shape has
> for all mandatory and optional parameters of that component."
> The word "declares" is only harmful here.
> For constraint components that have more than one parameter this definition
> loses which value is connected to which parameter.  For shapes that have two
> values for a single-parameter constraint component there is only one
> resultant constraint and that constraint has both of the values of the
> parameter.
>
> "Note that the definition above does not include all of the syntax rules of
> well-formed shapes."
> There is no notion of well-formed shapes introduced in the document, even
> though it is used in several places.  Similarly there is no notion of
> well-formed property shape or well-formed node shape introduced in the
> document even though both of these are used in the document.
>
> "Note that the definitions of well-formed property shapes and node shapes
> make these two sets of nodes disjoint."
> There is no definition of either well-formed property shape or well-formed
> node shape.
>
>
> ** Property Paths:
>
> "A node in an RDF graph is a well-formed SHACL property path p if it
> satisfies exactly one of the syntax rules in the following sub-sections. A
> node p is not a well-formed SHACL property path if p is a blank node and any
> path mappings of p directly or transitively reference p."
> It is possible that a node could both satisfy exactly one of the syntax
> rules and also refer back to itself.  What happens then?
> Every path mapping of p references p so all blank nodes are not well-formed
> SHACL property paths.
>
> "A sequence path is a blank node that is a SHACL list with at least two
> members and each member is a well-formed SHACL property path."
> Sequence paths can have extra information associated them.
>
> "An alternative path is a blank node that is the subject of exactly one
> triple in G."  "An inverse path is a blank node that is the subject of
> exactly one triple in G."  And so on.
> These paths can't have extra information associated with them.  Any blank
> node that is the subject of exactly one triple is lots of kinds of paths.
>
>
> ** Non-validating information
>
> What happens if the requirements in the non-normative 2.3.2 are violated?
>
>
> ** Targets:
>
> "If s is a SHACL instance of sh:NodeShape or sh:PropertyShape in a shapes
> graph SG and s is also a SHACL instance of rdfs:Class in SG then the set of
> SHACL instances of s in a data graph DG is a target from DG for s in SG."
> So a node that is a SHACL instance of sh:Shape and a SHACL instance of
> rdfs:Class will not produce an implicit class target.  This is going to
> trip up a lot of people and needs to be changed.
>
>
> ** Validation:
>
> "Conformance checking is a simplified version of validation, producing a
> boolean result."
> There is no definition of which boolean value conformance checking produces.
>
> "the validation process"
> What counts as part of the validation process?  Does checking for
> ill-formedness?  Does entailment?  Does piecing together the shapes graph or
> the data graph?
>
> "For example, SHACL processors may support recursion scenarios or produce an
> error when they detect recursion."
> I expect that this should be failure instead of error.
>
> "A shapes graph is an RDF graph containing zero or more shapes that is
> passed into a SHACL validation process so that a data graph can be validated
> against the shapes."
> Is the graph that is used when a node in a graph is validated against a
> shape in it a shapes graph?
>
> "Every value of sh:shapesGraph is an IRI representing a graph that should be
> included into the shapes graph used to validate the data graph."
> Shouldn't this be SHOULD?  Or should it be MAY as is used in the next
> sentence?
>
> "Validating an RDF term against a shape involves validating the term against
> each of the components for which the shape has values for all mandatory
> parameters, using the validators associated with the respective component."
> This incorrectly uses constraint components instead of constraints.
>
> "The validation of a focus node in the data graph against a constraint in
> the shapes graph produces the top-level validation results that are produced
> by the validator of the constraint component, using as input the focus node,
> the specific values of the parameters in the constraint, and the value nodes
> of the shape in the data graph."
> SHACL constraint components have multiple validators.
> Validation needs more than just the information listed above.
> Shapes don't have value nodes at all.
> The output of a validation process is not completely defined, so there is no
> notion of *the* results of validation.
>
> "A focus node conforms to a shape if and only if the validation of the shape
> does not produce any validation result or a failure."
> So no focus node will conform to a node shape that has a sh:not parameter.
>
>
> ** Conformance:
>
> "All SHACL implementations MUST at least cover the Core."
> Covering is not defined.
>
> "This specification describes conformance criteria for: SHACL Core [...],
> SHACL-SPARQL [...], SHACL Shapes Graphs [...], Validation of a data graph
> against a shapes graph [...], Validation of an RDF term from a data graph
> against a shape from the shapes graph [...], SHACL Core processors [...],
> SHACL-SPARQL processors [...]."
> Conformance is something that is required of implementations.  It doesn't
> make sense for special kinds of graphs or processes to conform.  Instead
> these are defined as something.
>
> An RDF graph is a (mathematical) set of RDF triples.  There are no
> operations that can modify an RDF graph defined in RDF.
>
>
> ** Core Constraint Components:
>
> "This section defines the built-in SHACL Core constraint components that
> MUST be supported by all SHACL Core processors."
> Are SPARQL-SHACL processors required to support these components as well?
>
> "The SPARQL definitions in this section represent potential validators."
> The SPARQL queries are in informative parts of the document and can't be
> considered to be definitions.  As many of the SPARQL
> definitions currently have problems, it would be better to just remove all
> the SPARQL query stuff from this section.
>
> "The following constraint components represent restrictions on the number of
> value nodes."
> Presumably the number of value nodes for a particular focus node, not the
> number of value nodes overall.
>
> "If this parameter is omitted then there is no limit on the number of
> triples."
> Which triples?  How can this parameter be omitted and there still be a
> constraint set up for it?
>
> "The values of sh:pattern are literals with datatype xsd:string that are
> valid pattern arguments for the SPARQL REGEX function."
> Having this as a syntax condition means that
> checking for correct operation of SHACL processors will need to be aware of
> whether a value for sh:pattern is a valid pattern argument for the SPARQL
> REGEX function.
>
> "If $flags has a value then it MUST be interpreted according to the third
> argument of the SPARQL REGEX function."
> What does the mean?  I'm guessing that the constraint component is supposed
> to act as if this value is the third argument, not that this value is
> interpreted as anything.
>
> "For each pair of value nodes and the values of the property $lessThan at
> the given focus node where the first value is not less than the second value
> (based on SPARQL's < operator) or where the two values cannot be compared, a
> validation result MUST be produced with the value node as sh:value."
> How many different validation results need to be produced if the set of
> value nodes is the set { 1, 2 } and the set of property values is the set
> { "a", "b" }?
>
> "For each value node that produces no validation results against the shape
> $not"
> So if there is a conjunction under the not, a validation result will be
> produced here if just one of the conjuncts produces a validation result.
>
> "For each value node where the validation of the value node against any of
> the members of $and produces a validation result and no failure, a
> validation result MUST be produced with the value node as sh:value."
> So if there is a negation under any of the conjunctions a validation result
> will always be produced.
>
> "If a value node is violating the constraint, sh:shape will produce only a
> single validation result, with sh:ShapeConstraintComponent as its
> sh:sourceConstraintComponent."
> Not correct.  The sh:shape could produce other validation results for other
> value nodes.
>
> "On the other hand side, sh:property may produce any number of validation
> results, and these will have the individual constraint components of the
> property shape as their values of sh:sourceConstraintComponent."
> Not correct.  The validation results produced here may have other values for
> sh:sourceConstraintComponent.
>
>
> ** Specify and  Declare:
>
> Specify and declare are use throughout the document in strange ways.  It
> would be better to replace these with words that do not have so much
> baggage, e.g., "For example, shapes can state" or "A shape in a shapes graph
> has a constraint of".
>
>
> ** SHACL-SPARQL:
>
> All aspects of SHACL-SPARQL depend heavily on pre-binding.  As pre-binding
> has never had a workable definition in SHACL there is no purpose in closely
> reviewing this part of the document at this time.  Either this part of the
> document needs to be removed or a workable definition of SHACL provided and
> this phase of the W3C process repeated.
>
> It is not clearly stated what SHACL Core processors need to do when they
> encounter constructs from SHACL-SPARQL.  It might be the case that SHACL
> Core processors can completely ignore SHACL-SPARQL constructs, because SHACL
> Core processors only "cover" SHACL Core, but it might also be the case that
> SHACL Core processors need to examine SHACL-SPARQL constructs, because these
> constructs might cause some nodes to be ill-formed.
>
> Both sh:prefixes and sh:prefix are used.  I expect that only one is needed.
>
> Peter F. Patel-Schneider
> Nuance Communications
>
>
>
>
Received on Monday, 27 February 2017 02:36:20 UTC