Re: testing methodology

Hi Peter,

this is just to acknowledge that the WG has received your input. The 
testing methodology is not regarded to be a show stopper for moving to 
CR, so we will look into this topic in depth after that date.

BTW I have meanwhile changed the test suite to also include 
sh:sourceShape triples. As a side effect of this, I have updated most 
tests to use URIs for all nested shapes, to add precision to the graph 
comparison.

Holger


On 14/03/2017 23:16, Peter F. Patel-Schneider wrote:
> I got a bit of time to take a slightly deeper look at the information on the
> SHACL testing methodology (as of 12 March 2017).  I uncovered a number of
> problems, some of which I have touched on earlier.
>
> Some of these problems will cause tests to produce incorrect results.
> However, even if these problems are fixed there is a core problem in the
> testing methodology---RDF graph isomorphism is inadequate to determine
> whether a SHACL implementation is producing conformant results.  Something
> will have to be done to address this fundamental problem.
>
>
> Some of the information on what happens to support SHACL testing is unclear
> so I had to do some work to figure out just what needs to happen.
>
> First, this required setting up some preliminary definitions.
>
> Preliminaries:
> Given a node n in an RDF graph G the list nodes of n in G LN(n,G) is the set
> of bindings of the variable o in those solution mappings in the result of
> SELECT ?s ?o WHERE { ?s ( rdf:rest* ?o } on G that bind the variable s to n.
> Given a node n in an RDF graph G the list triples of n in G LT(n,G) is the
> set of triples in G whose subject is in LN(n,G) and whose predicate is
> either rdf:first or rdf:rest.
> Given a node n in an RDF graph G the path nodes of n in G PN(n,G) is the set
> of bindings of the variable o in solution mappings in the result of
>    SELECT ?s ?o WHERE { ?s ( rdf:rest*/rdf:first |
>                            sh:alternativePath/(rdf:rest*/rdf:first) |
>        sh:inversePath | sh:zeroOrMorePath |
>        sh:oneOrMorePath | sh:zeroOrOnePath ) ?o }
> that bind the variable s to n.
> Given a node n in an RDF graph G the path triples of n in G PT(n,G) is the
> union of the set of triples in G whose subject is in PN(n,G) and LT(n',G)
> for each n' in PN(n,G) and LT(n'',G) for each <n',sh:alternativePath,n''> in
> G with n' in PN(n,G).
>
> I then had to clarify the testing methodology.
>
> Testing methodology:
> 1/ Start with the blank node r that is the mf:result value of the test
>     description in manifest graph M.
> 2/ Create an RDF graph E containing the triples <r,s,o> in G and the triples
>     <r',s,o> in G for each triple <r,sh:result,r'> in G and the triples in
>     PT(r'',G) for each r'' where <r,sh:result,r'> and <r',sh:resultPath,r''>
>     in G for some r'
> 3/ Take the result of validation
>    a) It can't have nested results. Note: UNCLEAR what to do
>    b) It has to have direct type links to sh:ValidationReport and
>      sh:ValidationResult. Note: UNCLEAR what to do
>    b) Replace all nodes that are SHACL instances of sh:ValidationResult and
>       sh:ValidationReport and not already blank nodes with distinct blank
>       nodes not occuring in result of validation
>    c) Remove triples whose predicate is not rdf:type, sh:focusNode,
>       sh:resultPath, sh:resultSeverity, sh:sourceConstraintComponent, or
>       sh:value. Note: REMOVES FAR TOO MUCH.
>    d) Remove triples whose predicate is rdf:type and whose object is not
>       sh:ValidationResult.  Note: STILL REMOVES TOO MUCH
> 4/ Check whether modified result of validation is RDF graph isomorphic to E.
>
> I have indicated several problems above.  I was unable to determine what
> should be done to remove nested results or to fix up typing.  The removal
> parts of the process remove far too much information, including information
> about result paths.
>
>
> I then took a quick look at the form validation reports (but not how they
> are generated).  I extracted the requirements on validation reports, coming
> up with the following description.
>
> Even if the problems mentioned above are fixed there are multiple
> requirements on validation reports that cannot be checked using RDF graph
> isomorphism.
>
> Validation report
> - has exactly one SHACL instance of sh:ValidationReport
>  Issue: RDF graph isomorphism can't directly check SHACL instance
> - conditions on the SHACL instance of sh:ValidationReport
>    - one value for sh:conforms - xsd:boolean
>        - "true"^^xsd:boolean iff no results of validation
>  Problem: RDF graph isomorphism looks at RDF literals, not their values
>    so can't check for differing non-conformance values
>  Potential Problem: xsd:boolean literals can be ill-formed
>    - value for sh:result for each result of validation - SHACL instance of
> sh:ValidationResult
>  Issue: RDF graph isomorphism can't directly check SHACL instance
>    - optional value for sh:shapesGraphWellFormed
>      - "true"^^xsd:boolean if known no syntax problems
>  Problem: RDF graph isomorphism can't check correctness of optional stuff
> - conditions on validation results - Replace with: SHACL instances of
> sh:ValidationResult
>  Problem: no definition of what a validation result is in an RDF graph
>    - exactly one value for sh:focusNode
>      -- focus node that caused the result
>  Issue: not always the case, e.g. for sh:property
>    - at most one value for sh:resultPath - well-formed property path
>  Note: Different validation results can share paths
>       Problem: RDF graph isomorphism can't check structure sharing
>      -- for property shapes, equivalent to sh:path of the shape
>  Issue: Not always true, e.g., sh:closed and sh:property
>  Note: For node shapes this could be any value.
>    - at most one value for sh:value - Addition: as specified by validator
>      -- something that caused the result - depends on constraint component
>    - at most one value for sh:sourceShape
>       Problem: RDF graph isomorphism can't check optional stuff
>      -- shape that the sh:focusNode was validated against
>    - exactly one value for sh:sourceConstraintComponent
>      -- constraint component that caused the result
>    - zero or more values for sh:detail - SHACL instances of sh:AbstractResult
>  Issue: RDF graph isomorphism can't directly check SHACL instance
>      -- more information about non-conformance - depends on implementation
>  Problem: RDF graph isomorphism can't check requirements on optional stuff
>    - zero or more values for sh:resultMessage
>  Issue: no normative information on how to determine these values
>       -- implementations may augment
>       Problem: RDF graph isomorphism can't check optional stuff
>    - exactly one value for sh:resultSeverity -
>      -- derived from shapes graph
>  Issue: no normative information on how to do derivation
> - the number of top-level validation results is not fixed as in validating
>      ex:s1 rdf:type sh:PropertyShape ;
>        sh:targetNode ex:i ;
>        sh:property [ sh:path ex:p ; sh:property ex:s2 ] ;
>        sh:property [ sh:path ex:q ; sh:property ex:s2 ] .
>      ex:s2 sh:path ex:p ; sh:class ex:C .
>    on the graph
>      ex:i ex:p ex:j ; ex:q ex:j . ex:j ex:p ex:j .
>    some implementations might produce one top-level validation result in
>    the validation report and others might produce two
>  Problem: RDF graph isomorphism can't check multiplicity variations
>
> During my quick look at validation reports I ran across two situations where
> there was no normative information on something should be done.  This
> appears to be a result of a recent edit to the SHACL document when a large
> amount of the document was labelled as non-normative.  The working group
> should go through all these changes to determine whether any other normative
> information has been mislabelled.
>
>
> Peter F. Patel-Schneider
> Nuance Communications
>

Received on Wednesday, 22 March 2017 04:08:31 UTC