Re: Shapes Constraint Language (SHACL) Working Draft of 2017-02-02 from Peter F. Patel-Schneider on 2017-02-23 (public-rdf-shapes@w3.org from February 2017)

From: Peter F. Patel-Schneider <pfpschneider@gmail.com>
Date: Wed, 22 Feb 2017 17:00:14 -0800
To: Holger Knublauch <holger@topquadrant.com>, public-rdf-shapes@w3.org
Message-ID: <794f9afb-0cbe-9e4a-c954-3f04c15f74ec@gmail.com>
First, on the mandate concerning producing validation results.

>From Section 4.1, and similar wording for every constraint component:

For each value node that is either a literal, or a non-literal that is not a
SHACL instance of $class in the data graph, a validation result MUST be
produced with the value node as sh:value.

>From Section 1.3

The key words MAY, MUST, MUST NOT, and SHOULD are to be interpreted as
described in [RFC2119].

>From RFC2119

1. MUST   This word, or the terms "REQUIRED" or "SHALL", mean that the
   definition is an absolute requirement of the specification.

So it is an absolute requirement of SHACL that implementations are required to
produce validation results for every focus node or value node (depending on
the constraint component) that does not satisfy the requirements of a
constraint.  This is an absolute requirement.  It cannot be overridden by
wording elsewhere.  Implementations cannot fail to produce these results even
if they will not show up in validation reports.  Implementations cannot fail
to produce these results even if they are irrelevant to the top-level
validations results.  Implementations cannot fail to produce these results
even if the only value being requested is whether the data graph conforms to
the shapes graph.

This use of such strong procedural language is problematic on at least three
counts.  First, it introduces a procedural component to validation and to
validation results.  Because of this unnecessary procedural wording SHACL has
to state how often a validation result is produced, and this statement is
missing.  Second, the strong language makes it impossible to optimize
validation.  Because the validation results must be produced, it is not
possible to skip checking focus or value nodes whose conformance will not
affect top-level validation results.  Third, the strong language forbids
implementation strategies that do not produce any non-top-level validations
results.

So the mixing of strong procedural aspects causes severe problems for SHACL,
mandating inefficient operation and forbidding useful implementation
strategies.  These procedural aspects of SHACL validation have to be removed.

The response from the working group on this fundamental part of SHACL is
completely inadequate.  It completely misses the main problem with the "MUST
be produced" wording that I laid out.


Second, on specifications providing procedural definitions.

It may be that procedural definitions are of use.  However, procedural
definitions that mandate particular implementation strategies are
counterproductive.  They prevent implementors from making useful
optimizations.  They may forbid entire implementation strategies.  The
procedural wording in question here does both and thus needs to be removed.


Third, on non-top level validation results.

Implementations are mandated to produce validation results.  Some of these
would not be considered to be top-level validation results.  Implementations
may or may not link from top-level validation results to these validation
results using sh:detail.  However, implementations may not do this and in this
case it appears that these validation results become top-level validation
results because they are not the object of any sh:detail triple.

So it appears that there will be too many top-level validation results unless
an implementation does link to them using sh:detail triples.


Fourth, on shapes that are both at top level and subsidiary to other shapes.

It is quite reasonable to reuse top-level shapes as subsidiary shapes.  In any
case, if a situation is allowable according to the syntax of SHACL, the
definition of SHACL has to do something reasonable for it.

I do not see any wording in the document that would require two validation
results in this case.  Under the current definition of SHACL an implementation
is free to optimize this situation and only validate the shape once against
any particular value node or shape node.  How is the single resultant
validation result to show up in the validation report?


Peter F. Patel-Schneider
Nuance Communications


On 02/22/2017 03:19 PM, Holger Knublauch wrote:
> Hi Peter,
> 
> this is the WG response on the 3rd part of your message (I have pruned the
> other parts). We had opened and resolved ISSUES-225, 228 and 229 to prepare
> this response.
> 
> On 4/02/2017 14:10, Peter F. Patel-Schneider wrote:
>> Validation results and reports:
>>
>> A validation report is the result of validation.  It is an RDF graph where
>> some nodes are validation results reporting on constraints that were not
>> satisifed.  There are serious problems in how validation reports are
>> generated and the form of validation reports.
>>
>> The first problem is the generation of validation results.  Throughout the
>> definitions of SHACL Core constraint components there is wording like "For
>> each value node [...], a validation result MUST be produced with the value
>> node as sh:value." and "If [...], a validation result MUST be produced."
>> This means that each SHACL processor must produce these validation results
>> to be a conforming implementation of SHACL.
>>
>> The processor must produce these validation results no matter whether they
>> are going to show up in the final validation report or not.  The processor
>> must produce these validation results even if it not going to return a
>> validation report at all.
> 
> In 3.6 we state that a SHACL-compliant processor must be *capable* of
> returning all these results. However, when executed with certain
> parameters, specific implementations may prune the results, for example
> to exclude results that have severity sh:Warning or sh:Info. Likewise,
> an engine is not required to produce nested results - these can go into
> a temporary graph (which is how I am implementing it too). However, the
> formal description is assuming that all results are reported.
> 
>>    This mixing of conformance requirements into the
>> definition of validation introduces an unnecessary and problematic
>> procedural aspect into the underlying definitions of SHACL.
> 
> We don't see a problem and believe this is largely a matter of "taste". A
> procedural description is very easy to understand for users and implementers,
> and these are among the main target audience of this topic.
> 
>>
>> Although it is mandated that a SHACL processor much produce these validation
>> results it is completely unclear how many must be produced.  A SHACL
>> processor may end up checking whether a particular node satisfies a
>> particular constraint numerous times.  Must it produce a validation result
>> for each of these times?  Must it only produce one validation result for all
>> of these times?  Or is the number of times it produce a validation result
>> undetermined?  This multiplicity problem can show up at top-level due to
>> converging sh:property chains.
> 
> I have meanwhile added a sentence to the introduction of section 4:
> 
> ---
> Furthermore, the validators always produce/new/result nodes, i.e. when
> the textual definition states that "...a validation result/must/be
> produced..." then this refers to a distinct new node in a results graph.
> ---
> 
> which I believe clarifies the three options above - it's the first.
> 
>>
>> The second problem is the form of a validation report.  There is
>> insufficient guidance on how multiple validation results are to be
>> produced.  For example, can a single validation result have multiple values
>> for sh:value, making it a validation report for multiple violations?
> 
> I have meanwhile added clarification that sh:value (with all other
> relevant result properties) can only have max one value. I have also
> added this new sentence (as mentioned above):
> 
> ---
> Furthermore, the validators always produce/new/result nodes, i.e. when
> the textual definition states that "...a validation result/must/be
> produced..." then this refers to a distinct new node in a results graph.
> ---
> 
> which excludes the case of sharing sh:value among result nodes.
> 
>> Similarly, if a shape has two sh:ClassConstraintComponent constraints, can
>> a single validation report be used for violations from both of them?
> 
> No, this case is excluded from the current definitions.
> 
>> Without better guidance on these issues it will be very difficult to
>> determine just violations occured from a validation report.
>>
>> The third problem is just what validation results are to be included in a
>> validation report and which of these are to be values of sh:result for the
>> single node in the graph that is a SHACL instance of sh:ValidationReport.
>> There is "Only the validation results that are not object of any sh:details
>> triple in the results graph are top-level results." and "The property
>> sh:detail may link a (parent) result with one or more other (child) results
>> that provide further details about the cause of the (parent) result."
>> So a validation process has to produce validation results which then end up
>> in the validation report if they are not values for sh:details triples.
> 
> Not exactly: only those results that are not values of sh:detail are
> *top-level* result. Yet nested results may also become part of the
> result graph.
> 
>> What happens if a validation result comes from violation of a constraint
>> that is both directly at top level (e.g., from a property shape that is
>> value of
>> sh:property for a shape that has targets) and not at top level (e.g., from
>> the same property shape as before that is linked to the shape with targets
>> via a combination of sh:node and sh:property triples)?
> 
> In this case it will produce two results, once for the direct invocation
> of the property shape via its target and once for the indirect
> invocation. However, I don't expect this case to ever happen in practice
> because there is no need to assign a target to a property shape that is
> already linked from another shape that also has a target.
> 
> 
>>    Can a SHACL
>> processor use sh:detail to collect that otherwise might be top-level
>> validation results?
> 
> (There is a word missing above, I guess you mean "...to collect results
> that..."?)
> 
> No, they would be distinct result nodes.
> 
>>
>> There are also some other minor problems with validation reports.  For
>> example, there is the requirement that "A validation report has exactly one
>> value for the property sh:conforms that is of datatype xsd:boolean."
>> However, the result of validation is an RDF graph and RDF graphs so this
>> requirement doesn't make sense.  The definitions underlying validation
>> reports need to be carefully examined to eliminate problems like these.
> 
> I have clarified the wording so that it now refers to "Each SHACL
> instance of sh:ValidationReport in the result graph", both for
> sh:conforms and sh:result. I also reviewed the rest of section 3.6. If
> anyone finds other specific cases of imprecision, please let us know.
> 
>>
>> Much of the description of how validation reports are generated and what
>> they contain need to be rewritten to remove any procedural aspects and to
>> suitably describe the contents of validation resports.  As this will change
>> large portions of the document, reviewers cannot provide fully informed
>> commments on it at this time.
>>
> 
> This is hopefully clarified now.
> 
> Holger
> 
>
Received on Thursday, 23 February 2017 01:00:53 UTC