Re: Shapes Constraint Language (SHACL) Working Draft of 2017-02-02 from Holger Knublauch on 2017-02-22 (public-rdf-shapes@w3.org from February 2017)

From: Holger Knublauch <holger@topquadrant.com>
Date: Thu, 23 Feb 2017 09:19:44 +1000
To: public-rdf-shapes@w3.org
Message-ID: <ffeadafb-ae73-8785-93e4-ca0299b94666@topquadrant.com>
Hi Peter,

this is the WG response on the 3rd part of your message (I have pruned 
the other parts). We had opened and resolved ISSUES-225, 228 and 229 to 
prepare this response.

On 4/02/2017 14:10, Peter F. Patel-Schneider wrote:
> Validation results and reports:
>
> A validation report is the result of validation.  It is an RDF graph where
> some nodes are validation results reporting on constraints that were not
> satisifed.  There are serious problems in how validation reports are
> generated and the form of validation reports.
>
> The first problem is the generation of validation results.  Throughout the
> definitions of SHACL Core constraint components there is wording like "For
> each value node [...], a validation result MUST be produced with the value
> node as sh:value." and "If [...], a validation result MUST be produced."
> This means that each SHACL processor must produce these validation results
> to be a conforming implementation of SHACL.
>
> The processor must produce these validation results no matter whether they
> are going to show up in the final validation report or not.  The processor
> must produce these validation results even if it not going to return a
> validation report at all.

In 3.6 we state that a SHACL-compliant processor must be *capable* of
returning all these results. However, when executed with certain
parameters, specific implementations may prune the results, for example
to exclude results that have severity sh:Warning or sh:Info. Likewise,
an engine is not required to produce nested results - these can go into
a temporary graph (which is how I am implementing it too). However, the
formal description is assuming that all results are reported.

>    This mixing of conformance requirements into the
> definition of validation introduces an unnecessary and problematic
> procedural aspect into the underlying definitions of SHACL.

We don't see a problem and believe this is largely a matter of "taste". A
procedural description is very easy to understand for users and 
implementers,
and these are among the main target audience of this topic.

>
> Although it is mandated that a SHACL processor much produce these validation
> results it is completely unclear how many must be produced.  A SHACL
> processor may end up checking whether a particular node satisfies a
> particular constraint numerous times.  Must it produce a validation result
> for each of these times?  Must it only produce one validation result for all
> of these times?  Or is the number of times it produce a validation result
> undetermined?  This multiplicity problem can show up at top-level due to
> converging sh:property chains.

I have meanwhile added a sentence to the introduction of section 4:

---
Furthermore, the validators always produce/new/result nodes, i.e. when
the textual definition states that "...a validation result/must/be
produced..." then this refers to a distinct new node in a results graph.
---

which I believe clarifies the three options above - it's the first.

>
> The second problem is the form of a validation report.  There is
> insufficient guidance on how multiple validation results are to be
> produced.  For example, can a single validation result have multiple values
> for sh:value, making it a validation report for multiple violations?

I have meanwhile added clarification that sh:value (with all other
relevant result properties) can only have max one value. I have also
added this new sentence (as mentioned above):

---
Furthermore, the validators always produce/new/result nodes, i.e. when
the textual definition states that "...a validation result/must/be
produced..." then this refers to a distinct new node in a results graph.
---

which excludes the case of sharing sh:value among result nodes.

> Similarly, if a shape has two sh:ClassConstraintComponent constraints, can
> a single validation report be used for violations from both of them?

No, this case is excluded from the current definitions.

> Without better guidance on these issues it will be very difficult to
> determine just violations occured from a validation report.
>
> The third problem is just what validation results are to be included in a
> validation report and which of these are to be values of sh:result for the
> single node in the graph that is a SHACL instance of sh:ValidationReport.
> There is "Only the validation results that are not object of any sh:details
> triple in the results graph are top-level results." and "The property
> sh:detail may link a (parent) result with one or more other (child) results
> that provide further details about the cause of the (parent) result."
> So a validation process has to produce validation results which then end up
> in the validation report if they are not values for sh:details triples.

Not exactly: only those results that are not values of sh:detail are
*top-level* result. Yet nested results may also become part of the
result graph.

> What happens if a validation result comes from violation of a constraint
> that is both directly at top level (e.g., from a property shape that is value of
> sh:property for a shape that has targets) and not at top level (e.g., from
> the same property shape as before that is linked to the shape with targets
> via a combination of sh:node and sh:property triples)?

In this case it will produce two results, once for the direct invocation
of the property shape via its target and once for the indirect
invocation. However, I don't expect this case to ever happen in practice
because there is no need to assign a target to a property shape that is
already linked from another shape that also has a target.


>    Can a SHACL
> processor use sh:detail to collect that otherwise might be top-level
> validation results?

(There is a word missing above, I guess you mean "...to collect results
that..."?)

No, they would be distinct result nodes.

>
> There are also some other minor problems with validation reports.  For
> example, there is the requirement that "A validation report has exactly one
> value for the property sh:conforms that is of datatype xsd:boolean."
> However, the result of validation is an RDF graph and RDF graphs so this
> requirement doesn't make sense.  The definitions underlying validation
> reports need to be carefully examined to eliminate problems like these.

I have clarified the wording so that it now refers to "Each SHACL
instance of sh:ValidationReport in the result graph", both for
sh:conforms and sh:result. I also reviewed the rest of section 3.6. If
anyone finds other specific cases of imprecision, please let us know.

>
> Much of the description of how validation reports are generated and what
> they contain need to be rewritten to remove any procedural aspects and to
> suitably describe the contents of validation resports.  As this will change
> large portions of the document, reviewers cannot provide fully informed
> commments on it at this time.
>

This is hopefully clarified now.

Holger
Received on Wednesday, 22 February 2017 23:20:23 UTC