Re: Shapes Constraint Language (SHACL) Working Draft of 2017-02-02 from Holger Knublauch on 2017-02-23 (public-rdf-shapes@w3.org from February 2017)

From: Holger Knublauch <holger@topquadrant.com>
Date: Thu, 23 Feb 2017 13:14:45 +1000
To: "Peter F. Patel-Schneider" <pfpschneider@gmail.com>, public-rdf-shapes@w3.org
Message-ID: <b43814b4-5f8f-3e28-8dcb-8523283778ea@topquadrant.com>
On 23/02/2017 11:00, Peter F. Patel-Schneider wrote:
> First, on the mandate concerning producing validation results.
>
>  From Section 4.1, and similar wording for every constraint component:
>
> For each value node that is either a literal, or a non-literal that is not a
> SHACL instance of $class in the data graph, a validation result MUST be
> produced with the value node as sh:value.
>
>  From Section 1.3
>
> The key words MAY, MUST, MUST NOT, and SHOULD are to be interpreted as
> described in [RFC2119].
>
>  From RFC2119
>
> 1. MUST   This word, or the terms "REQUIRED" or "SHALL", mean that the
>     definition is an absolute requirement of the specification.
>
> So it is an absolute requirement of SHACL that implementations are required to
> produce validation results for every focus node or value node (depending on
> the constraint component) that does not satisfy the requirements of a
> constraint.  This is an absolute requirement.  It cannot be overridden by
> wording elsewhere.  Implementations cannot fail to produce these results even
> if they will not show up in validation reports.  Implementations cannot fail
> to produce these results even if they are irrelevant to the top-level
> validations results.  Implementations cannot fail to produce these results
> even if the only value being requested is whether the data graph conforms to
> the shapes graph.
>
> This use of such strong procedural language is problematic on at least three
> counts.  First, it introduces a procedural component to validation and to
> validation results.  Because of this unnecessary procedural wording SHACL has
> to state how often a validation result is produced, and this statement is
> missing.  Second, the strong language makes it impossible to optimize
> validation.  Because the validation results must be produced, it is not
> possible to skip checking focus or value nodes whose conformance will not
> affect top-level validation results.  Third, the strong language forbids
> implementation strategies that do not produce any non-top-level validations
> results.
>
> So the mixing of strong procedural aspects causes severe problems for SHACL,
> mandating inefficient operation and forbidding useful implementation
> strategies.  These procedural aspects of SHACL validation have to be removed.
>
> The response from the working group on this fundamental part of SHACL is
> completely inadequate.  It completely misses the main problem with the "MUST
> be produced" wording that I laid out.

Where in the spec do we state that the temporary results (e.g. of 
sh:node) must be produced in the same results graph as the top-level 
ones? Of course the intention is that implementers can choose to either 
produce them in the same graph (and then link them via sh:details) or in 
another graph that is only visible for the duration of validation 
process. Since sh:details are optional and there is neither a way nor a 
need for an outsider to verify that the engine has in fact produced the 
temporary results, it is perfectly possible for engines to apply the 
optimizations that you are asking for. In my own implementation, calling 
sh:node creates a temporary recursive engine with its own results graph, 
and I expect most people to use the same technique.

If this is not clear from the current prose, then I guess it can easily 
be fixed by adding a sentence or two.

>
>
> Second, on specifications providing procedural definitions.
>
> It may be that procedural definitions are of use.  However, procedural
> definitions that mandate particular implementation strategies are
> counterproductive.  They prevent implementors from making useful
> optimizations.  They may forbid entire implementation strategies.  The
> procedural wording in question here does both and thus needs to be removed.

See above.

>
>
> Third, on non-top level validation results.
>
> Implementations are mandated to produce validation results.  Some of these
> would not be considered to be top-level validation results.  Implementations
> may or may not link from top-level validation results to these validation
> results using sh:detail.  However, implementations may not do this and in this
> case it appears that these validation results become top-level validation
> results because they are not the object of any sh:detail triple.
>
> So it appears that there will be too many top-level validation results unless
> an implementation does link to them using sh:detail triples.

We do have a sentence in 3.6.2.6 that is supposed to exclude this case:

Any validation results produced by the processing of shapes as values 
ofshape-expecting constraint parameters 
<#dfn-shape-expecting-constraint-parameters>except|sh:property|(such 
as|sh:node|) are temporary, i.e. they are nottop-level 
<#dfn-top-level>results in the results graph of the surrounding 
validation process.

Is this not clear enough?

>
>
> Fourth, on shapes that are both at top level and subsidiary to other shapes.
>
> It is quite reasonable to reuse top-level shapes as subsidiary shapes.  In any
> case, if a situation is allowable according to the syntax of SHACL, the
> definition of SHACL has to do something reasonable for it.
>
> I do not see any wording in the document that would require two validation
> results in this case.  Under the current definition of SHACL an implementation
> is free to optimize this situation and only validate the shape once against
> any particular value node or shape node.  How is the single resultant
> validation result to show up in the validation report?

We had recently added the sentence that any produced result nodes must 
be *new* nodes. I believe this excludes that scenario.

Holger


>
>
> Peter F. Patel-Schneider
> Nuance Communications
>
>
> On 02/22/2017 03:19 PM, Holger Knublauch wrote:
>> Hi Peter,
>>
>> this is the WG response on the 3rd part of your message (I have pruned the
>> other parts). We had opened and resolved ISSUES-225, 228 and 229 to prepare
>> this response.
>>
>> On 4/02/2017 14:10, Peter F. Patel-Schneider wrote:
>>> Validation results and reports:
>>>
>>> A validation report is the result of validation.  It is an RDF graph where
>>> some nodes are validation results reporting on constraints that were not
>>> satisifed.  There are serious problems in how validation reports are
>>> generated and the form of validation reports.
>>>
>>> The first problem is the generation of validation results.  Throughout the
>>> definitions of SHACL Core constraint components there is wording like "For
>>> each value node [...], a validation result MUST be produced with the value
>>> node as sh:value." and "If [...], a validation result MUST be produced."
>>> This means that each SHACL processor must produce these validation results
>>> to be a conforming implementation of SHACL.
>>>
>>> The processor must produce these validation results no matter whether they
>>> are going to show up in the final validation report or not.  The processor
>>> must produce these validation results even if it not going to return a
>>> validation report at all.
>> In 3.6 we state that a SHACL-compliant processor must be *capable* of
>> returning all these results. However, when executed with certain
>> parameters, specific implementations may prune the results, for example
>> to exclude results that have severity sh:Warning or sh:Info. Likewise,
>> an engine is not required to produce nested results - these can go into
>> a temporary graph (which is how I am implementing it too). However, the
>> formal description is assuming that all results are reported.
>>
>>>     This mixing of conformance requirements into the
>>> definition of validation introduces an unnecessary and problematic
>>> procedural aspect into the underlying definitions of SHACL.
>> We don't see a problem and believe this is largely a matter of "taste". A
>> procedural description is very easy to understand for users and implementers,
>> and these are among the main target audience of this topic.
>>
>>> Although it is mandated that a SHACL processor much produce these validation
>>> results it is completely unclear how many must be produced.  A SHACL
>>> processor may end up checking whether a particular node satisfies a
>>> particular constraint numerous times.  Must it produce a validation result
>>> for each of these times?  Must it only produce one validation result for all
>>> of these times?  Or is the number of times it produce a validation result
>>> undetermined?  This multiplicity problem can show up at top-level due to
>>> converging sh:property chains.
>> I have meanwhile added a sentence to the introduction of section 4:
>>
>> ---
>> Furthermore, the validators always produce/new/result nodes, i.e. when
>> the textual definition states that "...a validation result/must/be
>> produced..." then this refers to a distinct new node in a results graph.
>> ---
>>
>> which I believe clarifies the three options above - it's the first.
>>
>>> The second problem is the form of a validation report.  There is
>>> insufficient guidance on how multiple validation results are to be
>>> produced.  For example, can a single validation result have multiple values
>>> for sh:value, making it a validation report for multiple violations?
>> I have meanwhile added clarification that sh:value (with all other
>> relevant result properties) can only have max one value. I have also
>> added this new sentence (as mentioned above):
>>
>> ---
>> Furthermore, the validators always produce/new/result nodes, i.e. when
>> the textual definition states that "...a validation result/must/be
>> produced..." then this refers to a distinct new node in a results graph.
>> ---
>>
>> which excludes the case of sharing sh:value among result nodes.
>>
>>> Similarly, if a shape has two sh:ClassConstraintComponent constraints, can
>>> a single validation report be used for violations from both of them?
>> No, this case is excluded from the current definitions.
>>
>>> Without better guidance on these issues it will be very difficult to
>>> determine just violations occured from a validation report.
>>>
>>> The third problem is just what validation results are to be included in a
>>> validation report and which of these are to be values of sh:result for the
>>> single node in the graph that is a SHACL instance of sh:ValidationReport.
>>> There is "Only the validation results that are not object of any sh:details
>>> triple in the results graph are top-level results." and "The property
>>> sh:detail may link a (parent) result with one or more other (child) results
>>> that provide further details about the cause of the (parent) result."
>>> So a validation process has to produce validation results which then end up
>>> in the validation report if they are not values for sh:details triples.
>> Not exactly: only those results that are not values of sh:detail are
>> *top-level* result. Yet nested results may also become part of the
>> result graph.
>>
>>> What happens if a validation result comes from violation of a constraint
>>> that is both directly at top level (e.g., from a property shape that is
>>> value of
>>> sh:property for a shape that has targets) and not at top level (e.g., from
>>> the same property shape as before that is linked to the shape with targets
>>> via a combination of sh:node and sh:property triples)?
>> In this case it will produce two results, once for the direct invocation
>> of the property shape via its target and once for the indirect
>> invocation. However, I don't expect this case to ever happen in practice
>> because there is no need to assign a target to a property shape that is
>> already linked from another shape that also has a target.
>>
>>
>>>     Can a SHACL
>>> processor use sh:detail to collect that otherwise might be top-level
>>> validation results?
>> (There is a word missing above, I guess you mean "...to collect results
>> that..."?)
>>
>> No, they would be distinct result nodes.
>>
>>> There are also some other minor problems with validation reports.  For
>>> example, there is the requirement that "A validation report has exactly one
>>> value for the property sh:conforms that is of datatype xsd:boolean."
>>> However, the result of validation is an RDF graph and RDF graphs so this
>>> requirement doesn't make sense.  The definitions underlying validation
>>> reports need to be carefully examined to eliminate problems like these.
>> I have clarified the wording so that it now refers to "Each SHACL
>> instance of sh:ValidationReport in the result graph", both for
>> sh:conforms and sh:result. I also reviewed the rest of section 3.6. If
>> anyone finds other specific cases of imprecision, please let us know.
>>
>>> Much of the description of how validation reports are generated and what
>>> they contain need to be rewritten to remove any procedural aspects and to
>>> suitably describe the contents of validation resports.  As this will change
>>> large portions of the document, reviewers cannot provide fully informed
>>> commments on it at this time.
>>>
>> This is hopefully clarified now.
>>
>> Holger
>>
>>
Received on Thursday, 23 February 2017 03:20:53 UTC