- From: Holger Knublauch <holger@topquadrant.com>
- Date: Wed, 22 Mar 2017 14:07:54 +1000
- To: public-rdf-shapes@w3.org
Hi Peter,
this is just to acknowledge that the WG has received your input. The
testing methodology is not regarded to be a show stopper for moving to
CR, so we will look into this topic in depth after that date.
BTW I have meanwhile changed the test suite to also include
sh:sourceShape triples. As a side effect of this, I have updated most
tests to use URIs for all nested shapes, to add precision to the graph
comparison.
Holger
On 14/03/2017 23:16, Peter F. Patel-Schneider wrote:
> I got a bit of time to take a slightly deeper look at the information on the
> SHACL testing methodology (as of 12 March 2017). I uncovered a number of
> problems, some of which I have touched on earlier.
>
> Some of these problems will cause tests to produce incorrect results.
> However, even if these problems are fixed there is a core problem in the
> testing methodology---RDF graph isomorphism is inadequate to determine
> whether a SHACL implementation is producing conformant results. Something
> will have to be done to address this fundamental problem.
>
>
> Some of the information on what happens to support SHACL testing is unclear
> so I had to do some work to figure out just what needs to happen.
>
> First, this required setting up some preliminary definitions.
>
> Preliminaries:
> Given a node n in an RDF graph G the list nodes of n in G LN(n,G) is the set
> of bindings of the variable o in those solution mappings in the result of
> SELECT ?s ?o WHERE { ?s ( rdf:rest* ?o } on G that bind the variable s to n.
> Given a node n in an RDF graph G the list triples of n in G LT(n,G) is the
> set of triples in G whose subject is in LN(n,G) and whose predicate is
> either rdf:first or rdf:rest.
> Given a node n in an RDF graph G the path nodes of n in G PN(n,G) is the set
> of bindings of the variable o in solution mappings in the result of
> SELECT ?s ?o WHERE { ?s ( rdf:rest*/rdf:first |
> sh:alternativePath/(rdf:rest*/rdf:first) |
> sh:inversePath | sh:zeroOrMorePath |
> sh:oneOrMorePath | sh:zeroOrOnePath ) ?o }
> that bind the variable s to n.
> Given a node n in an RDF graph G the path triples of n in G PT(n,G) is the
> union of the set of triples in G whose subject is in PN(n,G) and LT(n',G)
> for each n' in PN(n,G) and LT(n'',G) for each <n',sh:alternativePath,n''> in
> G with n' in PN(n,G).
>
> I then had to clarify the testing methodology.
>
> Testing methodology:
> 1/ Start with the blank node r that is the mf:result value of the test
> description in manifest graph M.
> 2/ Create an RDF graph E containing the triples <r,s,o> in G and the triples
> <r',s,o> in G for each triple <r,sh:result,r'> in G and the triples in
> PT(r'',G) for each r'' where <r,sh:result,r'> and <r',sh:resultPath,r''>
> in G for some r'
> 3/ Take the result of validation
> a) It can't have nested results. Note: UNCLEAR what to do
> b) It has to have direct type links to sh:ValidationReport and
> sh:ValidationResult. Note: UNCLEAR what to do
> b) Replace all nodes that are SHACL instances of sh:ValidationResult and
> sh:ValidationReport and not already blank nodes with distinct blank
> nodes not occuring in result of validation
> c) Remove triples whose predicate is not rdf:type, sh:focusNode,
> sh:resultPath, sh:resultSeverity, sh:sourceConstraintComponent, or
> sh:value. Note: REMOVES FAR TOO MUCH.
> d) Remove triples whose predicate is rdf:type and whose object is not
> sh:ValidationResult. Note: STILL REMOVES TOO MUCH
> 4/ Check whether modified result of validation is RDF graph isomorphic to E.
>
> I have indicated several problems above. I was unable to determine what
> should be done to remove nested results or to fix up typing. The removal
> parts of the process remove far too much information, including information
> about result paths.
>
>
> I then took a quick look at the form validation reports (but not how they
> are generated). I extracted the requirements on validation reports, coming
> up with the following description.
>
> Even if the problems mentioned above are fixed there are multiple
> requirements on validation reports that cannot be checked using RDF graph
> isomorphism.
>
> Validation report
> - has exactly one SHACL instance of sh:ValidationReport
> Issue: RDF graph isomorphism can't directly check SHACL instance
> - conditions on the SHACL instance of sh:ValidationReport
> - one value for sh:conforms - xsd:boolean
> - "true"^^xsd:boolean iff no results of validation
> Problem: RDF graph isomorphism looks at RDF literals, not their values
> so can't check for differing non-conformance values
> Potential Problem: xsd:boolean literals can be ill-formed
> - value for sh:result for each result of validation - SHACL instance of
> sh:ValidationResult
> Issue: RDF graph isomorphism can't directly check SHACL instance
> - optional value for sh:shapesGraphWellFormed
> - "true"^^xsd:boolean if known no syntax problems
> Problem: RDF graph isomorphism can't check correctness of optional stuff
> - conditions on validation results - Replace with: SHACL instances of
> sh:ValidationResult
> Problem: no definition of what a validation result is in an RDF graph
> - exactly one value for sh:focusNode
> -- focus node that caused the result
> Issue: not always the case, e.g. for sh:property
> - at most one value for sh:resultPath - well-formed property path
> Note: Different validation results can share paths
> Problem: RDF graph isomorphism can't check structure sharing
> -- for property shapes, equivalent to sh:path of the shape
> Issue: Not always true, e.g., sh:closed and sh:property
> Note: For node shapes this could be any value.
> - at most one value for sh:value - Addition: as specified by validator
> -- something that caused the result - depends on constraint component
> - at most one value for sh:sourceShape
> Problem: RDF graph isomorphism can't check optional stuff
> -- shape that the sh:focusNode was validated against
> - exactly one value for sh:sourceConstraintComponent
> -- constraint component that caused the result
> - zero or more values for sh:detail - SHACL instances of sh:AbstractResult
> Issue: RDF graph isomorphism can't directly check SHACL instance
> -- more information about non-conformance - depends on implementation
> Problem: RDF graph isomorphism can't check requirements on optional stuff
> - zero or more values for sh:resultMessage
> Issue: no normative information on how to determine these values
> -- implementations may augment
> Problem: RDF graph isomorphism can't check optional stuff
> - exactly one value for sh:resultSeverity -
> -- derived from shapes graph
> Issue: no normative information on how to do derivation
> - the number of top-level validation results is not fixed as in validating
> ex:s1 rdf:type sh:PropertyShape ;
> sh:targetNode ex:i ;
> sh:property [ sh:path ex:p ; sh:property ex:s2 ] ;
> sh:property [ sh:path ex:q ; sh:property ex:s2 ] .
> ex:s2 sh:path ex:p ; sh:class ex:C .
> on the graph
> ex:i ex:p ex:j ; ex:q ex:j . ex:j ex:p ex:j .
> some implementations might produce one top-level validation result in
> the validation report and others might produce two
> Problem: RDF graph isomorphism can't check multiplicity variations
>
> During my quick look at validation reports I ran across two situations where
> there was no normative information on something should be done. This
> appears to be a result of a recent edit to the SHACL document when a large
> amount of the document was labelled as non-normative. The working group
> should go through all these changes to determine whether any other normative
> information has been mislabelled.
>
>
> Peter F. Patel-Schneider
> Nuance Communications
>
Received on Wednesday, 22 March 2017 04:08:31 UTC