- From: Peter F. Patel-Schneider <pfpschneider@gmail.com>
- Date: Tue, 14 Mar 2017 06:16:56 -0700
- To: "public-rdf-shapes@w3.org" <public-rdf-shapes@w3.org>
I got a bit of time to take a slightly deeper look at the information on the SHACL testing methodology (as of 12 March 2017). I uncovered a number of problems, some of which I have touched on earlier. Some of these problems will cause tests to produce incorrect results. However, even if these problems are fixed there is a core problem in the testing methodology---RDF graph isomorphism is inadequate to determine whether a SHACL implementation is producing conformant results. Something will have to be done to address this fundamental problem. Some of the information on what happens to support SHACL testing is unclear so I had to do some work to figure out just what needs to happen. First, this required setting up some preliminary definitions. Preliminaries: Given a node n in an RDF graph G the list nodes of n in G LN(n,G) is the set of bindings of the variable o in those solution mappings in the result of SELECT ?s ?o WHERE { ?s ( rdf:rest* ?o } on G that bind the variable s to n. Given a node n in an RDF graph G the list triples of n in G LT(n,G) is the set of triples in G whose subject is in LN(n,G) and whose predicate is either rdf:first or rdf:rest. Given a node n in an RDF graph G the path nodes of n in G PN(n,G) is the set of bindings of the variable o in solution mappings in the result of SELECT ?s ?o WHERE { ?s ( rdf:rest*/rdf:first | sh:alternativePath/(rdf:rest*/rdf:first) | sh:inversePath | sh:zeroOrMorePath | sh:oneOrMorePath | sh:zeroOrOnePath ) ?o } that bind the variable s to n. Given a node n in an RDF graph G the path triples of n in G PT(n,G) is the union of the set of triples in G whose subject is in PN(n,G) and LT(n',G) for each n' in PN(n,G) and LT(n'',G) for each <n',sh:alternativePath,n''> in G with n' in PN(n,G). I then had to clarify the testing methodology. Testing methodology: 1/ Start with the blank node r that is the mf:result value of the test description in manifest graph M. 2/ Create an RDF graph E containing the triples <r,s,o> in G and the triples <r',s,o> in G for each triple <r,sh:result,r'> in G and the triples in PT(r'',G) for each r'' where <r,sh:result,r'> and <r',sh:resultPath,r''> in G for some r' 3/ Take the result of validation a) It can't have nested results. Note: UNCLEAR what to do b) It has to have direct type links to sh:ValidationReport and sh:ValidationResult. Note: UNCLEAR what to do b) Replace all nodes that are SHACL instances of sh:ValidationResult and sh:ValidationReport and not already blank nodes with distinct blank nodes not occuring in result of validation c) Remove triples whose predicate is not rdf:type, sh:focusNode, sh:resultPath, sh:resultSeverity, sh:sourceConstraintComponent, or sh:value. Note: REMOVES FAR TOO MUCH. d) Remove triples whose predicate is rdf:type and whose object is not sh:ValidationResult. Note: STILL REMOVES TOO MUCH 4/ Check whether modified result of validation is RDF graph isomorphic to E. I have indicated several problems above. I was unable to determine what should be done to remove nested results or to fix up typing. The removal parts of the process remove far too much information, including information about result paths. I then took a quick look at the form validation reports (but not how they are generated). I extracted the requirements on validation reports, coming up with the following description. Even if the problems mentioned above are fixed there are multiple requirements on validation reports that cannot be checked using RDF graph isomorphism. Validation report - has exactly one SHACL instance of sh:ValidationReport Issue: RDF graph isomorphism can't directly check SHACL instance - conditions on the SHACL instance of sh:ValidationReport - one value for sh:conforms - xsd:boolean - "true"^^xsd:boolean iff no results of validation Problem: RDF graph isomorphism looks at RDF literals, not their values so can't check for differing non-conformance values Potential Problem: xsd:boolean literals can be ill-formed - value for sh:result for each result of validation - SHACL instance of sh:ValidationResult Issue: RDF graph isomorphism can't directly check SHACL instance - optional value for sh:shapesGraphWellFormed - "true"^^xsd:boolean if known no syntax problems Problem: RDF graph isomorphism can't check correctness of optional stuff - conditions on validation results - Replace with: SHACL instances of sh:ValidationResult Problem: no definition of what a validation result is in an RDF graph - exactly one value for sh:focusNode -- focus node that caused the result Issue: not always the case, e.g. for sh:property - at most one value for sh:resultPath - well-formed property path Note: Different validation results can share paths Problem: RDF graph isomorphism can't check structure sharing -- for property shapes, equivalent to sh:path of the shape Issue: Not always true, e.g., sh:closed and sh:property Note: For node shapes this could be any value. - at most one value for sh:value - Addition: as specified by validator -- something that caused the result - depends on constraint component - at most one value for sh:sourceShape Problem: RDF graph isomorphism can't check optional stuff -- shape that the sh:focusNode was validated against - exactly one value for sh:sourceConstraintComponent -- constraint component that caused the result - zero or more values for sh:detail - SHACL instances of sh:AbstractResult Issue: RDF graph isomorphism can't directly check SHACL instance -- more information about non-conformance - depends on implementation Problem: RDF graph isomorphism can't check requirements on optional stuff - zero or more values for sh:resultMessage Issue: no normative information on how to determine these values -- implementations may augment Problem: RDF graph isomorphism can't check optional stuff - exactly one value for sh:resultSeverity - -- derived from shapes graph Issue: no normative information on how to do derivation - the number of top-level validation results is not fixed as in validating ex:s1 rdf:type sh:PropertyShape ; sh:targetNode ex:i ; sh:property [ sh:path ex:p ; sh:property ex:s2 ] ; sh:property [ sh:path ex:q ; sh:property ex:s2 ] . ex:s2 sh:path ex:p ; sh:class ex:C . on the graph ex:i ex:p ex:j ; ex:q ex:j . ex:j ex:p ex:j . some implementations might produce one top-level validation result in the validation report and others might produce two Problem: RDF graph isomorphism can't check multiplicity variations During my quick look at validation reports I ran across two situations where there was no normative information on something should be done. This appears to be a result of a recent edit to the SHACL document when a large amount of the document was labelled as non-normative. The working group should go through all these changes to determine whether any other normative information has been mislabelled. Peter F. Patel-Schneider Nuance Communications
Received on Tuesday, 14 March 2017 13:17:30 UTC