Re: central problem with test suite from Peter F. Patel-Schneider on 2017-03-14 (public-rdf-shapes@w3.org from March 2017)

From: Peter F. Patel-Schneider <pfpschneider@gmail.com>
Date: Tue, 14 Mar 2017 04:58:26 -0700
To: Holger Knublauch <holger@topquadrant.com>, "public-rdf-shapes@w3.org" <public-rdf-shapes@w3.org>
Message-ID: <172303bb-8cec-a12e-2eae-ded0a9fc108c@gmail.com>
On 03/13/2017 10:39 PM, Holger Knublauch wrote:
>
>
> On 13/03/2017 21:22, Peter F. Patel-Schneider wrote:
>> On 03/12/2017 04:48 PM, Holger Knublauch wrote:
>>> The test suit document is work in progress and I have basically just started
>>> to take a deeper look. I welcome any help on this and really don't want to
>>> "own" this document.
>>>
>>> On 12/03/2017 6:02, Peter F. Patel-Schneider wrote:
>>>> It's going to be hard.  It's not possible to just remove the parts of the
>>>> validation report that can vary because some of these parts have
>>>> conditions on
>>>> them.  For example, removing type and subclass triples will prevent checking
>>>> the SHACL instance requirements.
>>> Ok, the fact that reports allow for instances of subclasses of
>>> sh:ValidationReport and sh:ValidationResult indeed requires an extra
>>> pre-processing step. I have now added this step, normalizing these to their
>>> direct rdf:type.
>>>
>>>>    There is also the problem that there are
>>>> different RDF literals with the same value.
>>> Why is this a problem? I believe RDF ismorphism relies on term equality:
>>>
>>> https://www.w3.org/TR/rdf11-concepts/#graph-isomorphism
>>> https://www.w3.org/TR/rdf11-concepts/#dfn-literal-term-equality
>> Just following the second link here shows that RDF term equality looks at
>> the syntactic form of RDF literals, not their value.  There is even a very
>> illustrative example provided.
>>
>> https://www.w3.org/TR/rdf11-concepts/#dfn-literal-term-equality
>> *******************
>> Literal term equality: Two literals are term-equal (the same RDF literal) if
>> and only if the two lexical forms, the two datatype IRIs, and the two
>> language tags (if any) compare equal, character by character. Thus, two
>> literals can have the same value without being the same RDF term. For
>> example:
>>        "1"^^xs:integer
>>        "01"^^xs:integer
>> denote the same value, but are not the same literal RDF terms and are not
>> term-equal because their lexical form differs.
>> *******************
>
> That's understood, but I believe term equality is what we want, not value
> equality. AFAICS all of the properties in the results vocabulary (e.g.
> sh:focusNode, sh:resultPath, sh:sourceShape) can only have precisely matching
> values. The only times where they can be literals such as "1" vs "01" is if
> they point at values from the data graph via sh:value, and in those cases we
> are doing term equality too. So I don't see the problem that you seem to see
> right now.

Yes, you are right.  The SHACL document defined true in red as a particular
RDF term and uses true in red throughout where it talks about validation
results.  My fault for not looking closely enough and assuming that true in
SHACL validation reports could be any RDF term whose RDF value is true.

>>>>    Probably the biggest problem is
>>>> that the number of values for sh:result can vary between SHACL Core
>>>> implementations for the same validation.
>>> This is not the intention of the spec. The spec states that each validator
>>> must have a mode in which it always produces all results.
>>>
>>> SHACL-compliant processors /must/ be capable of returning a validation report
>>> with all required validation results
>>> <http://w3c.github.io/data-shapes/shacl/#dfn-validation-results>described in
>>> this specification.
>> Consider the validating the data graph
>>    ex:i ex:p ex:j ; ex:q ex:j .
>>    ex:j ex:p ex:j .
>> against the shapes graph
>>    ex:s1 rdf:type sh:PropertyShape ;
>>      sh:targetNode ex:i ;
>>      sh:property [ sh:path ex:p ; sh:property ex:s2 ] ;
>>      sh:property [ sh:path ex:q ; sh:property ex:s2 ] .
>>    ex:s2 sh:path ex:p ; sh:class ex:C .
>> It is reasonable and acceptable to have one top-level validation result
>> here.  It seems to me that there is an argument that it is also reasonable
>> and acceptable to have two top-level validation results here.
>
> The intention, and what I believe the current spec states, is that two results
> must be produced in this case - the intro to section 4 states that it always
> has to produce new result nodes and these cannot be shared. Also the
> validation is defined per-focus-node and not for a group of focus nodes (which
> may indeed cause duplicate value nodes to be swallowed up). So if ex:s2 for
> ex:j is reached by two property shapes, it will produce one result for each
> original focus node.

Not so.  Even with the wording about producing new results a SPARQL
implementation is free to optimize its performance.  For example, the
implementation may decide to cache results of validation.  This can result
in fewer validations being performed.  The results of these validations can
then be used multiple times and then show up in the validation report.

> Holger

peter
Received on Tuesday, 14 March 2017 11:59:01 UTC