- From: Holger Knublauch <holger@topquadrant.com>
- Date: Wed, 22 Mar 2017 14:07:54 +1000
- To: public-rdf-shapes@w3.org
Hi Peter, this is just to acknowledge that the WG has received your input. The testing methodology is not regarded to be a show stopper for moving to CR, so we will look into this topic in depth after that date. BTW I have meanwhile changed the test suite to also include sh:sourceShape triples. As a side effect of this, I have updated most tests to use URIs for all nested shapes, to add precision to the graph comparison. Holger On 14/03/2017 23:16, Peter F. Patel-Schneider wrote: > I got a bit of time to take a slightly deeper look at the information on the > SHACL testing methodology (as of 12 March 2017). I uncovered a number of > problems, some of which I have touched on earlier. > > Some of these problems will cause tests to produce incorrect results. > However, even if these problems are fixed there is a core problem in the > testing methodology---RDF graph isomorphism is inadequate to determine > whether a SHACL implementation is producing conformant results. Something > will have to be done to address this fundamental problem. > > > Some of the information on what happens to support SHACL testing is unclear > so I had to do some work to figure out just what needs to happen. > > First, this required setting up some preliminary definitions. > > Preliminaries: > Given a node n in an RDF graph G the list nodes of n in G LN(n,G) is the set > of bindings of the variable o in those solution mappings in the result of > SELECT ?s ?o WHERE { ?s ( rdf:rest* ?o } on G that bind the variable s to n. > Given a node n in an RDF graph G the list triples of n in G LT(n,G) is the > set of triples in G whose subject is in LN(n,G) and whose predicate is > either rdf:first or rdf:rest. > Given a node n in an RDF graph G the path nodes of n in G PN(n,G) is the set > of bindings of the variable o in solution mappings in the result of > SELECT ?s ?o WHERE { ?s ( rdf:rest*/rdf:first | > sh:alternativePath/(rdf:rest*/rdf:first) | > sh:inversePath | sh:zeroOrMorePath | > sh:oneOrMorePath | sh:zeroOrOnePath ) ?o } > that bind the variable s to n. > Given a node n in an RDF graph G the path triples of n in G PT(n,G) is the > union of the set of triples in G whose subject is in PN(n,G) and LT(n',G) > for each n' in PN(n,G) and LT(n'',G) for each <n',sh:alternativePath,n''> in > G with n' in PN(n,G). > > I then had to clarify the testing methodology. > > Testing methodology: > 1/ Start with the blank node r that is the mf:result value of the test > description in manifest graph M. > 2/ Create an RDF graph E containing the triples <r,s,o> in G and the triples > <r',s,o> in G for each triple <r,sh:result,r'> in G and the triples in > PT(r'',G) for each r'' where <r,sh:result,r'> and <r',sh:resultPath,r''> > in G for some r' > 3/ Take the result of validation > a) It can't have nested results. Note: UNCLEAR what to do > b) It has to have direct type links to sh:ValidationReport and > sh:ValidationResult. Note: UNCLEAR what to do > b) Replace all nodes that are SHACL instances of sh:ValidationResult and > sh:ValidationReport and not already blank nodes with distinct blank > nodes not occuring in result of validation > c) Remove triples whose predicate is not rdf:type, sh:focusNode, > sh:resultPath, sh:resultSeverity, sh:sourceConstraintComponent, or > sh:value. Note: REMOVES FAR TOO MUCH. > d) Remove triples whose predicate is rdf:type and whose object is not > sh:ValidationResult. Note: STILL REMOVES TOO MUCH > 4/ Check whether modified result of validation is RDF graph isomorphic to E. > > I have indicated several problems above. I was unable to determine what > should be done to remove nested results or to fix up typing. The removal > parts of the process remove far too much information, including information > about result paths. > > > I then took a quick look at the form validation reports (but not how they > are generated). I extracted the requirements on validation reports, coming > up with the following description. > > Even if the problems mentioned above are fixed there are multiple > requirements on validation reports that cannot be checked using RDF graph > isomorphism. > > Validation report > - has exactly one SHACL instance of sh:ValidationReport > Issue: RDF graph isomorphism can't directly check SHACL instance > - conditions on the SHACL instance of sh:ValidationReport > - one value for sh:conforms - xsd:boolean > - "true"^^xsd:boolean iff no results of validation > Problem: RDF graph isomorphism looks at RDF literals, not their values > so can't check for differing non-conformance values > Potential Problem: xsd:boolean literals can be ill-formed > - value for sh:result for each result of validation - SHACL instance of > sh:ValidationResult > Issue: RDF graph isomorphism can't directly check SHACL instance > - optional value for sh:shapesGraphWellFormed > - "true"^^xsd:boolean if known no syntax problems > Problem: RDF graph isomorphism can't check correctness of optional stuff > - conditions on validation results - Replace with: SHACL instances of > sh:ValidationResult > Problem: no definition of what a validation result is in an RDF graph > - exactly one value for sh:focusNode > -- focus node that caused the result > Issue: not always the case, e.g. for sh:property > - at most one value for sh:resultPath - well-formed property path > Note: Different validation results can share paths > Problem: RDF graph isomorphism can't check structure sharing > -- for property shapes, equivalent to sh:path of the shape > Issue: Not always true, e.g., sh:closed and sh:property > Note: For node shapes this could be any value. > - at most one value for sh:value - Addition: as specified by validator > -- something that caused the result - depends on constraint component > - at most one value for sh:sourceShape > Problem: RDF graph isomorphism can't check optional stuff > -- shape that the sh:focusNode was validated against > - exactly one value for sh:sourceConstraintComponent > -- constraint component that caused the result > - zero or more values for sh:detail - SHACL instances of sh:AbstractResult > Issue: RDF graph isomorphism can't directly check SHACL instance > -- more information about non-conformance - depends on implementation > Problem: RDF graph isomorphism can't check requirements on optional stuff > - zero or more values for sh:resultMessage > Issue: no normative information on how to determine these values > -- implementations may augment > Problem: RDF graph isomorphism can't check optional stuff > - exactly one value for sh:resultSeverity - > -- derived from shapes graph > Issue: no normative information on how to do derivation > - the number of top-level validation results is not fixed as in validating > ex:s1 rdf:type sh:PropertyShape ; > sh:targetNode ex:i ; > sh:property [ sh:path ex:p ; sh:property ex:s2 ] ; > sh:property [ sh:path ex:q ; sh:property ex:s2 ] . > ex:s2 sh:path ex:p ; sh:class ex:C . > on the graph > ex:i ex:p ex:j ; ex:q ex:j . ex:j ex:p ex:j . > some implementations might produce one top-level validation result in > the validation report and others might produce two > Problem: RDF graph isomorphism can't check multiplicity variations > > During my quick look at validation reports I ran across two situations where > there was no normative information on something should be done. This > appears to be a result of a recent edit to the SHACL document when a large > amount of the document was labelled as non-normative. The working group > should go through all these changes to determine whether any other normative > information has been mislabelled. > > > Peter F. Patel-Schneider > Nuance Communications >
Received on Wednesday, 22 March 2017 04:08:31 UTC