testing methodology

I got a bit of time to take a slightly deeper look at the information on the
SHACL testing methodology (as of 12 March 2017).  I uncovered a number of
problems, some of which I have touched on earlier.

Some of these problems will cause tests to produce incorrect results.
However, even if these problems are fixed there is a core problem in the
testing methodology---RDF graph isomorphism is inadequate to determine
whether a SHACL implementation is producing conformant results.  Something
will have to be done to address this fundamental problem.


Some of the information on what happens to support SHACL testing is unclear
so I had to do some work to figure out just what needs to happen.

First, this required setting up some preliminary definitions.

Preliminaries:
Given a node n in an RDF graph G the list nodes of n in G LN(n,G) is the set
of bindings of the variable o in those solution mappings in the result of
SELECT ?s ?o WHERE { ?s ( rdf:rest* ?o } on G that bind the variable s to n.
Given a node n in an RDF graph G the list triples of n in G LT(n,G) is the
set of triples in G whose subject is in LN(n,G) and whose predicate is
either rdf:first or rdf:rest.
Given a node n in an RDF graph G the path nodes of n in G PN(n,G) is the set
of bindings of the variable o in solution mappings in the result of
  SELECT ?s ?o WHERE { ?s ( rdf:rest*/rdf:first |
                          sh:alternativePath/(rdf:rest*/rdf:first) |
       sh:inversePath | sh:zeroOrMorePath |
       sh:oneOrMorePath | sh:zeroOrOnePath ) ?o }
that bind the variable s to n.
Given a node n in an RDF graph G the path triples of n in G PT(n,G) is the
union of the set of triples in G whose subject is in PN(n,G) and LT(n',G)
for each n' in PN(n,G) and LT(n'',G) for each <n',sh:alternativePath,n''> in
G with n' in PN(n,G).

I then had to clarify the testing methodology.

Testing methodology:
1/ Start with the blank node r that is the mf:result value of the test
   description in manifest graph M.
2/ Create an RDF graph E containing the triples <r,s,o> in G and the triples
   <r',s,o> in G for each triple <r,sh:result,r'> in G and the triples in
   PT(r'',G) for each r'' where <r,sh:result,r'> and <r',sh:resultPath,r''>
   in G for some r'
3/ Take the result of validation
  a) It can't have nested results. Note: UNCLEAR what to do
  b) It has to have direct type links to sh:ValidationReport and
    sh:ValidationResult. Note: UNCLEAR what to do
  b) Replace all nodes that are SHACL instances of sh:ValidationResult and
     sh:ValidationReport and not already blank nodes with distinct blank
     nodes not occuring in result of validation
  c) Remove triples whose predicate is not rdf:type, sh:focusNode,
     sh:resultPath, sh:resultSeverity, sh:sourceConstraintComponent, or
     sh:value. Note: REMOVES FAR TOO MUCH.
  d) Remove triples whose predicate is rdf:type and whose object is not
     sh:ValidationResult.  Note: STILL REMOVES TOO MUCH
4/ Check whether modified result of validation is RDF graph isomorphic to E.

I have indicated several problems above.  I was unable to determine what
should be done to remove nested results or to fix up typing.  The removal
parts of the process remove far too much information, including information
about result paths.


I then took a quick look at the form validation reports (but not how they
are generated).  I extracted the requirements on validation reports, coming
up with the following description.

Even if the problems mentioned above are fixed there are multiple
requirements on validation reports that cannot be checked using RDF graph
isomorphism.

Validation report
- has exactly one SHACL instance of sh:ValidationReport
 Issue: RDF graph isomorphism can't directly check SHACL instance
- conditions on the SHACL instance of sh:ValidationReport
  - one value for sh:conforms - xsd:boolean
      - "true"^^xsd:boolean iff no results of validation
 Problem: RDF graph isomorphism looks at RDF literals, not their values
   so can't check for differing non-conformance values
 Potential Problem: xsd:boolean literals can be ill-formed
  - value for sh:result for each result of validation - SHACL instance of
sh:ValidationResult
 Issue: RDF graph isomorphism can't directly check SHACL instance
  - optional value for sh:shapesGraphWellFormed
    - "true"^^xsd:boolean if known no syntax problems
 Problem: RDF graph isomorphism can't check correctness of optional stuff
- conditions on validation results - Replace with: SHACL instances of
sh:ValidationResult
 Problem: no definition of what a validation result is in an RDF graph
  - exactly one value for sh:focusNode
    -- focus node that caused the result
 Issue: not always the case, e.g. for sh:property
  - at most one value for sh:resultPath - well-formed property path
 Note: Different validation results can share paths
     Problem: RDF graph isomorphism can't check structure sharing
    -- for property shapes, equivalent to sh:path of the shape
 Issue: Not always true, e.g., sh:closed and sh:property
 Note: For node shapes this could be any value.
  - at most one value for sh:value - Addition: as specified by validator
    -- something that caused the result - depends on constraint component
  - at most one value for sh:sourceShape
     Problem: RDF graph isomorphism can't check optional stuff
    -- shape that the sh:focusNode was validated against
  - exactly one value for sh:sourceConstraintComponent
    -- constraint component that caused the result
  - zero or more values for sh:detail - SHACL instances of sh:AbstractResult
 Issue: RDF graph isomorphism can't directly check SHACL instance
    -- more information about non-conformance - depends on implementation
 Problem: RDF graph isomorphism can't check requirements on optional stuff
  - zero or more values for sh:resultMessage
 Issue: no normative information on how to determine these values
     -- implementations may augment
     Problem: RDF graph isomorphism can't check optional stuff
  - exactly one value for sh:resultSeverity -
    -- derived from shapes graph
 Issue: no normative information on how to do derivation
- the number of top-level validation results is not fixed as in validating
    ex:s1 rdf:type sh:PropertyShape ;
      sh:targetNode ex:i ;
      sh:property [ sh:path ex:p ; sh:property ex:s2 ] ;
      sh:property [ sh:path ex:q ; sh:property ex:s2 ] .
    ex:s2 sh:path ex:p ; sh:class ex:C .
  on the graph
    ex:i ex:p ex:j ; ex:q ex:j . ex:j ex:p ex:j .
  some implementations might produce one top-level validation result in
  the validation report and others might produce two
 Problem: RDF graph isomorphism can't check multiplicity variations

During my quick look at validation reports I ran across two situations where
there was no normative information on something should be done.  This
appears to be a result of a recent edit to the SHACL document when a large
amount of the document was labelled as non-normative.  The working group
should go through all these changes to determine whether any other normative
information has been mislabelled.


Peter F. Patel-Schneider
Nuance Communications

Received on Tuesday, 14 March 2017 13:17:30 UTC