- From: Peter F. Patel-Schneider <pfpschneider@gmail.com>
- Date: Sat, 25 Feb 2017 07:03:48 -0800
- To: "public-rdf-shapes@w3.org" <public-rdf-shapes@w3.org>
Here are some comments on this document. In summary, there are still lots of significant problems. Addressing these problems will result in substantial changes to this document. I have not examined the portions of the document labelled as non-normative as closely as the normative sections. I have treated all portions of the document labelled as non-normative as if they do not contain any normative content. Some of the comments below are about problems that occur in multiple places in the document. I have not always listed all of these places. It was hard to decipher the odd wording in many places to get to the underlying meaning. The document needs a rewrite to state the definition of SHACL in a consistent manner. Part of the problem here, but only part, is the mixing of different and phrases that can carry definitional import, including "if ... then", "declared", "specified", and "is". Another part of the problem is the use of "MUST" in the definitions of constraint components. This set of comments is separate from my previous comments on SHACL. ** Problems in early definitions: "A property path is a possible route in a graph between two graph nodes." Route is not defined. Possible route is not defined. (This wording is used once in the SPARQL document but is not defined there.) "A binding is a pair (variable, RDF term), consistent with the term's use in SPARQL." If binding is taken from SPARQL just link to its definition as is done for the other terms taken from SPARQL "A solution is a set of bindings, one row in the body of the result table of a SPARQL query." In SPARQL a solution is not one row in the body of the result table of a SPARQL query. That is just how it is shown in some places. The actual correct term is solution mapping. "The results table is a SolutionSequence, a list of solutions, possibly unordered." Results table is not used in SHACL. "A node in an RDF graph is a SHACL instance of a SHACL class in the graph if one of its SHACL types is the class." SHACL types require reference to a graph. "true denotes the RDF term "true"^^xsd:boolean. false denotes the RDF term "false"^^xsd:boolean." There is no notion of what denotes means here. "Target declarations are values of certain properties (such as sh:targetClass) for a shape in a shapes graph." More than just a value is needed. "All bindings of the variable this from the solution become focus nodes." SPARQL queries do not return a solution as defined in this document. ** Unclear wording: "(In this document, the verbs specify or declare are sometimes used to express the fact that an RDF term has property values in a graph.)" As opposed to an RDF term not having any values for any property in a graph? "A constraint component is an IRI." I don't see how every IRI is a constraint component. "The IRI is used, among others, in validation reports." I never would have imagined that validation reports could only use IRIs that are constraint components. "SHACL-SPARQL can be used to declare additional constraint components based on SPARQL." What part of SHACL are these additional constraint components in? ** Strange links: The definition of member has a link back to itself. ** Normative wording in non-normative portions of the document: Section 1.6 is labelled as non-normative but discusses how to treat other portions of the document and states conformance requirements for SHACL implementations. In many places there are what are labelled as SPARQL definitions of SHACL Core constraint components. These are in non-normative portions of the document. The document needs to not make the impression that these bits of SPARQL are definitions of portions of the SHACL Core. There are probably other places where definitional wording occurs in non-normative parts of the document. These should be removed or changed to not give any impression that they are normative. ** SHACL Vocabulary: What is the status of the SHACL vocabulary? SHACL is not an ontology that needs a RDF graph interpreted using the RDFS semantics to provide its vocabulary. All that SHACL needs is a set of IRIs that are used in its RDF syntax. What then is the status of the mentioned RDF graph? How is this graph to be interpreted? Is information entailed by the graph using the RDF or RDFS semantics have any effect on SHACL? Is any information in the graph part of the definition of SHACL? Of SHACL Core? Is all of the information in the graph part of the definition of SHACL? Of SHACL Core? ** Shapes: If any node in a shapes graph has a sh:shape link back to itself then the shapes graph is recursive and behaviour of SHACL processors on the graph is undefined even if this node is completely disconnected from the rest of the graph. It would be better to have the behaviour of SHACL processors defined in cases like this. It used to be that top-level shapes were conventionally indicated by an rdf:type link to sh:Shape. This has apparently changed to sh:NodeShape. There is no apparent reason for this change, which should be changed back. sh:Shape does not appear to have any effect at all in SHACL. If this is the case then it should be removed. The syntax rule for shapes doesn't appear to be a syntax rule at all. Instead it is defining what a shape is. "A shape in a shapes graph declares a constraint of kind c if c is a constraint component and the shape has values for all mandatory parameters of c. The constraint declaration consists of the values that the shape has for all mandatory and optional parameters of that component." The word "declares" is only harmful here. For constraint components that have more than one parameter this definition loses which value is connected to which parameter. For shapes that have two values for a single-parameter constraint component there is only one resultant constraint and that constraint has both of the values of the parameter. "Note that the definition above does not include all of the syntax rules of well-formed shapes." There is no notion of well-formed shapes introduced in the document, even though it is used in several places. Similarly there is no notion of well-formed property shape or well-formed node shape introduced in the document even though both of these are used in the document. "Note that the definitions of well-formed property shapes and node shapes make these two sets of nodes disjoint." There is no definition of either well-formed property shape or well-formed node shape. ** Property Paths: "A node in an RDF graph is a well-formed SHACL property path p if it satisfies exactly one of the syntax rules in the following sub-sections. A node p is not a well-formed SHACL property path if p is a blank node and any path mappings of p directly or transitively reference p." It is possible that a node could both satisfy exactly one of the syntax rules and also refer back to itself. What happens then? Every path mapping of p references p so all blank nodes are not well-formed SHACL property paths. "A sequence path is a blank node that is a SHACL list with at least two members and each member is a well-formed SHACL property path." Sequence paths can have extra information associated them. "An alternative path is a blank node that is the subject of exactly one triple in G." "An inverse path is a blank node that is the subject of exactly one triple in G." And so on. These paths can't have extra information associated with them. Any blank node that is the subject of exactly one triple is lots of kinds of paths. ** Non-validating information What happens if the requirements in the non-normative 2.3.2 are violated? ** Targets: "If s is a SHACL instance of sh:NodeShape or sh:PropertyShape in a shapes graph SG and s is also a SHACL instance of rdfs:Class in SG then the set of SHACL instances of s in a data graph DG is a target from DG for s in SG." So a node that is a SHACL instance of sh:Shape and a SHACL instance of rdfs:Class will not produce an implicit class target. This is going to trip up a lot of people and needs to be changed. ** Validation: "Conformance checking is a simplified version of validation, producing a boolean result." There is no definition of which boolean value conformance checking produces. "the validation process" What counts as part of the validation process? Does checking for ill-formedness? Does entailment? Does piecing together the shapes graph or the data graph? "For example, SHACL processors may support recursion scenarios or produce an error when they detect recursion." I expect that this should be failure instead of error. "A shapes graph is an RDF graph containing zero or more shapes that is passed into a SHACL validation process so that a data graph can be validated against the shapes." Is the graph that is used when a node in a graph is validated against a shape in it a shapes graph? "Every value of sh:shapesGraph is an IRI representing a graph that should be included into the shapes graph used to validate the data graph." Shouldn't this be SHOULD? Or should it be MAY as is used in the next sentence? "Validating an RDF term against a shape involves validating the term against each of the components for which the shape has values for all mandatory parameters, using the validators associated with the respective component." This incorrectly uses constraint components instead of constraints. "The validation of a focus node in the data graph against a constraint in the shapes graph produces the top-level validation results that are produced by the validator of the constraint component, using as input the focus node, the specific values of the parameters in the constraint, and the value nodes of the shape in the data graph." SHACL constraint components have multiple validators. Validation needs more than just the information listed above. Shapes don't have value nodes at all. The output of a validation process is not completely defined, so there is no notion of *the* results of validation. "A focus node conforms to a shape if and only if the validation of the shape does not produce any validation result or a failure." So no focus node will conform to a node shape that has a sh:not parameter. ** Conformance: "All SHACL implementations MUST at least cover the Core." Covering is not defined. "This specification describes conformance criteria for: SHACL Core [...], SHACL-SPARQL [...], SHACL Shapes Graphs [...], Validation of a data graph against a shapes graph [...], Validation of an RDF term from a data graph against a shape from the shapes graph [...], SHACL Core processors [...], SHACL-SPARQL processors [...]." Conformance is something that is required of implementations. It doesn't make sense for special kinds of graphs or processes to conform. Instead these are defined as something. An RDF graph is a (mathematical) set of RDF triples. There are no operations that can modify an RDF graph defined in RDF. ** Core Constraint Components: "This section defines the built-in SHACL Core constraint components that MUST be supported by all SHACL Core processors." Are SPARQL-SHACL processors required to support these components as well? "The SPARQL definitions in this section represent potential validators." The SPARQL queries are in informative parts of the document and can't be considered to be definitions. As many of the SPARQL definitions currently have problems, it would be better to just remove all the SPARQL query stuff from this section. "The following constraint components represent restrictions on the number of value nodes." Presumably the number of value nodes for a particular focus node, not the number of value nodes overall. "If this parameter is omitted then there is no limit on the number of triples." Which triples? How can this parameter be omitted and there still be a constraint set up for it? "The values of sh:pattern are literals with datatype xsd:string that are valid pattern arguments for the SPARQL REGEX function." Having this as a syntax condition means that checking for correct operation of SHACL processors will need to be aware of whether a value for sh:pattern is a valid pattern argument for the SPARQL REGEX function. "If $flags has a value then it MUST be interpreted according to the third argument of the SPARQL REGEX function." What does the mean? I'm guessing that the constraint component is supposed to act as if this value is the third argument, not that this value is interpreted as anything. "For each pair of value nodes and the values of the property $lessThan at the given focus node where the first value is not less than the second value (based on SPARQL's < operator) or where the two values cannot be compared, a validation result MUST be produced with the value node as sh:value." How many different validation results need to be produced if the set of value nodes is the set { 1, 2 } and the set of property values is the set { "a", "b" }? "For each value node that produces no validation results against the shape $not" So if there is a conjunction under the not, a validation result will be produced here if just one of the conjuncts produces a validation result. "For each value node where the validation of the value node against any of the members of $and produces a validation result and no failure, a validation result MUST be produced with the value node as sh:value." So if there is a negation under any of the conjunctions a validation result will always be produced. "If a value node is violating the constraint, sh:shape will produce only a single validation result, with sh:ShapeConstraintComponent as its sh:sourceConstraintComponent." Not correct. The sh:shape could produce other validation results for other value nodes. "On the other hand side, sh:property may produce any number of validation results, and these will have the individual constraint components of the property shape as their values of sh:sourceConstraintComponent." Not correct. The validation results produced here may have other values for sh:sourceConstraintComponent. ** Specify and Declare: Specify and declare are use throughout the document in strange ways. It would be better to replace these with words that do not have so much baggage, e.g., "For example, shapes can state" or "A shape in a shapes graph has a constraint of". ** SHACL-SPARQL: All aspects of SHACL-SPARQL depend heavily on pre-binding. As pre-binding has never had a workable definition in SHACL there is no purpose in closely reviewing this part of the document at this time. Either this part of the document needs to be removed or a workable definition of SHACL provided and this phase of the W3C process repeated. It is not clearly stated what SHACL Core processors need to do when they encounter constructs from SHACL-SPARQL. It might be the case that SHACL Core processors can completely ignore SHACL-SPARQL constructs, because SHACL Core processors only "cover" SHACL Core, but it might also be the case that SHACL Core processors need to examine SHACL-SPARQL constructs, because these constructs might cause some nodes to be ill-formed. Both sh:prefixes and sh:prefix are used. I expect that only one is needed. Peter F. Patel-Schneider Nuance Communications
Received on Saturday, 25 February 2017 15:04:26 UTC