validation results and validation reports from Peter F. Patel-Schneider on 2017-03-14 (public-rdf-shapes@w3.org from March 2017)

From: Peter F. Patel-Schneider <pfpschneider@gmail.com>
Date: Tue, 14 Mar 2017 06:29:38 -0700
To: "public-rdf-shapes@w3.org" <public-rdf-shapes@w3.org>
Message-ID: <864b3d93-6045-3f2a-4b3e-6b9f686078f0@gmail.com>
I was able to spend a bit of time looking at how validation results and
validation reports are specified and found a number of issues.


It is unclear just what a validation result is.  Several occurrences of
"validation result" link to "The validation report is the result of the
validation process that reports the conformance and the set of all
validation results." which isn't much help.  Later on there is "SHACL
defines sh:ValidationResult as a subclass of sh:AbstractResult to report
individual SHACL validation results." but this doesn't help much either.

So, is a validation result a node in a validation report that is a SHACL
instance of sh:ValidationResult, i.e., is the "reports" above "contain"?  Is
instead a validation result a node in a any RDF graph that is a SHACL
instance of sh:ValidationResult?  Or is instead a validation result some
abstract object that is later represented by a SHACL instance of
sh:ValidationResult, i.e., is the "reports" above something like "denotes"?
The difference between these matters for the generation of validation
results and in the definition of validation.

If a validation result is something in an RDF graph then much of the current
definition of validation in SHACL still reads as a statement of absolute
truth, i.e., that whenever a SHACL implementation validates a node against a
shape the implementation has to arrange for an RDF graph to be created that
contains the validation results.  The relevant wording occurs for constraint
components in SHACL Core and is generally of the form "For each value node
that ..., there is a validation result with with the value node as
sh:value." or "If ..., there is a validation result."  Other relevant
wording is repeated for each SHACL Core constraint component that uses other
shapes and is generally of the form "... if v does not conform ..., there is
a validation result with v as sh:value."  If an implementation does not
arrange things so that such things exist in some RDF graph then it is not
conforming to the definition of SHACL, even if these things will not show up
in validation reports.

However, particular SHACL implementations may not be capable of
creating these graphs, even if their validation reports otherwise conform to
the requirements for SHACL.  Thus considering validation results as
something in an RDF graph prevents certain kinds of SHACL implementations
and needs to be changed.  As well, it is unclear whether the union of
validation results is supposed to be RDF graph union or some other operation
on RDF graphs.

If, on the other hand, a validation result is some abstract object then just
what is it?  How do operations like equality and union work?  There is not
adequate information in the SHACL document to answer these questions.  So
considering validation results as abstract objects exposes a lack of
sufficient information to implement SHACL.

To fix this problem requires being clear as to what a validation result is
and then to state what validation results come from a validation in a way
that doesn't require that they are actually created.  The major change would
be to change the Textual Definitions to read something like "The result of
validating a constraint of kind sh:ClassConstraintComponent is any RDF graph
containing different validation results for the constraint for each value
node that is either a literal, or a non-literal that is not a SHACL instance
of $class in the data graph, and no other validation results."  This makes
it clear what particular validations produce without implying that a SHACL
implementation actually has to create the result in all situations.

Changes are also needed where validation results are combined to be explicit
that the combination method is RDF graph union for any of the possible
results and to eliminate wording implying that SHACL instances of
sh:ValidationResult are indeed validation results and just somehow related
to actual validation results.


SHACL does not precisely specify which validation results arise from
validation even for validation results that end up in a validation report.
This uncertainty has a number of causes.  A shape may use a separate shape
multiple times.  Validation results from this multiply-used shape may show
up in a final validation result just once or might show up multiple times
depending on how a SHACL implementation works.  For example a SHACL
implementation might have one validation result in its validation report for
validating
    ex:i ex:p ex:j ; ex:q ex:j . ex:j ex:p ex:j .
against
    ex:s1 sh:targetNode ex:i ;
      sh:property [ sh:path ex:p ; sh:property ex:s2 ] ;
      sh:property [ sh:path ex:q ; sh:property ex:s2 ] .
    ex:s2 sh:path ex:p ; sh:class ex:C .
but other SHACL implementations might have two.  Further, a shape might be
used both as a top-level shape and as an embedded shape.  Validation results
from both of these uses may show up in a final validation report but the
number that show up may be undetermined. For example, a SHACL implementation
might have one validation result in its validation report for validating
    ex:k ex:p ex:l . ex:l ex:p ex:l .
against
    ex:s3 sh:targetNode ex:k ;
      sh:path ex:p ;
      sh:property ex:s4 .
    ex:s4 sh:targetNode ex:l ;
      sh:path ex:p ; sh:class ex:C .
but other SHACL implementations might have two.

This issue is not overcome by the statements about distinctness of results,
like "Furthermore, the validators always produce new result nodes", as
different SHACL implementations can perform particular validations different
numbers of times.  For example, an optimizing SHACL implementation might
only validate ex:j against ex:s4 once but a non-optimizing SHACL
implementation would probably perform this validation twice.  This issue
isn't exactly a problem - there is no good reason to forbid implementations
differing from other implemenations on the number of validation results that
show up in validation reports.  However, as reported earlier, the testing
methodology is then inadequate to check compliance.


There are also several minor issues with the wording related to validation
reports and results that I found.  There still needs to be a close
examination of the SHACL document to check for more of these issues.

Some of the wording in the SHACL document still talks about producing
validation results and needs to be adjusted.

Validation of a focus node against a shape depends on "the validation of the
focus node against all constraints declared by the shape".  "A shape in a
shapes graph declares a constraint of kind c if c is a constraint component
and the shape has values for all mandatory parameters of c."  However, this
ignores situations where a shape as multiple values for parameters of
constraint components that have a single parameter.  This is correctly
overridden in the next paragraph, but the incorrect should be corrected.
Then there is "The interpretation of such declarations is conjunction,
i.e. all constraints apply." which is redundant.


Peter F. Patel-Schneider
Nuance Communications
Received on Tuesday, 14 March 2017 13:30:13 UTC