- From: Peter F. Patel-Schneider <pfpschneider@gmail.com>
- Date: Tue, 14 Mar 2017 06:29:38 -0700
- To: "public-rdf-shapes@w3.org" <public-rdf-shapes@w3.org>
I was able to spend a bit of time looking at how validation results and validation reports are specified and found a number of issues. It is unclear just what a validation result is. Several occurrences of "validation result" link to "The validation report is the result of the validation process that reports the conformance and the set of all validation results." which isn't much help. Later on there is "SHACL defines sh:ValidationResult as a subclass of sh:AbstractResult to report individual SHACL validation results." but this doesn't help much either. So, is a validation result a node in a validation report that is a SHACL instance of sh:ValidationResult, i.e., is the "reports" above "contain"? Is instead a validation result a node in a any RDF graph that is a SHACL instance of sh:ValidationResult? Or is instead a validation result some abstract object that is later represented by a SHACL instance of sh:ValidationResult, i.e., is the "reports" above something like "denotes"? The difference between these matters for the generation of validation results and in the definition of validation. If a validation result is something in an RDF graph then much of the current definition of validation in SHACL still reads as a statement of absolute truth, i.e., that whenever a SHACL implementation validates a node against a shape the implementation has to arrange for an RDF graph to be created that contains the validation results. The relevant wording occurs for constraint components in SHACL Core and is generally of the form "For each value node that ..., there is a validation result with with the value node as sh:value." or "If ..., there is a validation result." Other relevant wording is repeated for each SHACL Core constraint component that uses other shapes and is generally of the form "... if v does not conform ..., there is a validation result with v as sh:value." If an implementation does not arrange things so that such things exist in some RDF graph then it is not conforming to the definition of SHACL, even if these things will not show up in validation reports. However, particular SHACL implementations may not be capable of creating these graphs, even if their validation reports otherwise conform to the requirements for SHACL. Thus considering validation results as something in an RDF graph prevents certain kinds of SHACL implementations and needs to be changed. As well, it is unclear whether the union of validation results is supposed to be RDF graph union or some other operation on RDF graphs. If, on the other hand, a validation result is some abstract object then just what is it? How do operations like equality and union work? There is not adequate information in the SHACL document to answer these questions. So considering validation results as abstract objects exposes a lack of sufficient information to implement SHACL. To fix this problem requires being clear as to what a validation result is and then to state what validation results come from a validation in a way that doesn't require that they are actually created. The major change would be to change the Textual Definitions to read something like "The result of validating a constraint of kind sh:ClassConstraintComponent is any RDF graph containing different validation results for the constraint for each value node that is either a literal, or a non-literal that is not a SHACL instance of $class in the data graph, and no other validation results." This makes it clear what particular validations produce without implying that a SHACL implementation actually has to create the result in all situations. Changes are also needed where validation results are combined to be explicit that the combination method is RDF graph union for any of the possible results and to eliminate wording implying that SHACL instances of sh:ValidationResult are indeed validation results and just somehow related to actual validation results. SHACL does not precisely specify which validation results arise from validation even for validation results that end up in a validation report. This uncertainty has a number of causes. A shape may use a separate shape multiple times. Validation results from this multiply-used shape may show up in a final validation result just once or might show up multiple times depending on how a SHACL implementation works. For example a SHACL implementation might have one validation result in its validation report for validating ex:i ex:p ex:j ; ex:q ex:j . ex:j ex:p ex:j . against ex:s1 sh:targetNode ex:i ; sh:property [ sh:path ex:p ; sh:property ex:s2 ] ; sh:property [ sh:path ex:q ; sh:property ex:s2 ] . ex:s2 sh:path ex:p ; sh:class ex:C . but other SHACL implementations might have two. Further, a shape might be used both as a top-level shape and as an embedded shape. Validation results from both of these uses may show up in a final validation report but the number that show up may be undetermined. For example, a SHACL implementation might have one validation result in its validation report for validating ex:k ex:p ex:l . ex:l ex:p ex:l . against ex:s3 sh:targetNode ex:k ; sh:path ex:p ; sh:property ex:s4 . ex:s4 sh:targetNode ex:l ; sh:path ex:p ; sh:class ex:C . but other SHACL implementations might have two. This issue is not overcome by the statements about distinctness of results, like "Furthermore, the validators always produce new result nodes", as different SHACL implementations can perform particular validations different numbers of times. For example, an optimizing SHACL implementation might only validate ex:j against ex:s4 once but a non-optimizing SHACL implementation would probably perform this validation twice. This issue isn't exactly a problem - there is no good reason to forbid implementations differing from other implemenations on the number of validation results that show up in validation reports. However, as reported earlier, the testing methodology is then inadequate to check compliance. There are also several minor issues with the wording related to validation reports and results that I found. There still needs to be a close examination of the SHACL document to check for more of these issues. Some of the wording in the SHACL document still talks about producing validation results and needs to be adjusted. Validation of a focus node against a shape depends on "the validation of the focus node against all constraints declared by the shape". "A shape in a shapes graph declares a constraint of kind c if c is a constraint component and the shape has values for all mandatory parameters of c." However, this ignores situations where a shape as multiple values for parameters of constraint components that have a single parameter. This is correctly overridden in the next paragraph, but the incorrect should be corrected. Then there is "The interpretation of such declarations is conjunction, i.e. all constraints apply." which is redundant. Peter F. Patel-Schneider Nuance Communications
Received on Tuesday, 14 March 2017 13:30:13 UTC