Re: central problem with test suite from Peter F. Patel-Schneider on 2017-03-15 (public-rdf-shapes@w3.org from March 2017)

From: Peter F. Patel-Schneider <pfpschneider@gmail.com>
Date: Wed, 15 Mar 2017 04:42:00 -0700
To: Holger Knublauch <holger@topquadrant.com>, "public-rdf-shapes@w3.org" <public-rdf-shapes@w3.org>
Message-ID: <6d200dd5-a01d-3619-3aa6-df4b2c45315d@gmail.com>
Sure, the inadequacy of RDF graph isomorphism to check validation reports that
differ structurally can be covered by my other comment.

peter


On 03/14/2017 08:28 PM, Holger Knublauch wrote:
> 
> 
> On 14/03/2017 21:58, Peter F. Patel-Schneider wrote:
>> On 03/13/2017 10:39 PM, Holger Knublauch wrote:
>>>
>>> On 13/03/2017 21:22, Peter F. Patel-Schneider wrote:
>>>> On 03/12/2017 04:48 PM, Holger Knublauch wrote:
>>>>> The test suit document is work in progress and I have basically just started
>>>>> to take a deeper look. I welcome any help on this and really don't want to
>>>>> "own" this document.
>>>>>
>>>>> On 12/03/2017 6:02, Peter F. Patel-Schneider wrote:
>>>>>> It's going to be hard.  It's not possible to just remove the parts of the
>>>>>> validation report that can vary because some of these parts have
>>>>>> conditions on
>>>>>> them.  For example, removing type and subclass triples will prevent
>>>>>> checking
>>>>>> the SHACL instance requirements.
>>>>> Ok, the fact that reports allow for instances of subclasses of
>>>>> sh:ValidationReport and sh:ValidationResult indeed requires an extra
>>>>> pre-processing step. I have now added this step, normalizing these to their
>>>>> direct rdf:type.
>>>>>
>>>>>>     There is also the problem that there are
>>>>>> different RDF literals with the same value.
>>>>> Why is this a problem? I believe RDF ismorphism relies on term equality:
>>>>>
>>>>> https://www.w3.org/TR/rdf11-concepts/#graph-isomorphism
>>>>> https://www.w3.org/TR/rdf11-concepts/#dfn-literal-term-equality
>>>> Just following the second link here shows that RDF term equality looks at
>>>> the syntactic form of RDF literals, not their value.  There is even a very
>>>> illustrative example provided.
>>>>
>>>> https://www.w3.org/TR/rdf11-concepts/#dfn-literal-term-equality
>>>> *******************
>>>> Literal term equality: Two literals are term-equal (the same RDF literal) if
>>>> and only if the two lexical forms, the two datatype IRIs, and the two
>>>> language tags (if any) compare equal, character by character. Thus, two
>>>> literals can have the same value without being the same RDF term. For
>>>> example:
>>>>         "1"^^xs:integer
>>>>         "01"^^xs:integer
>>>> denote the same value, but are not the same literal RDF terms and are not
>>>> term-equal because their lexical form differs.
>>>> *******************
>>> That's understood, but I believe term equality is what we want, not value
>>> equality. AFAICS all of the properties in the results vocabulary (e.g.
>>> sh:focusNode, sh:resultPath, sh:sourceShape) can only have precisely matching
>>> values. The only times where they can be literals such as "1" vs "01" is if
>>> they point at values from the data graph via sh:value, and in those cases we
>>> are doing term equality too. So I don't see the problem that you seem to see
>>> right now.
>> Yes, you are right.  The SHACL document defined true in red as a particular
>> RDF term and uses true in red throughout where it talks about validation
>> results.  My fault for not looking closely enough and assuming that true in
>> SHACL validation reports could be any RDF term whose RDF value is true.
> 
> I just responded to this.
> 
>>
>>>>>>     Probably the biggest problem is
>>>>>> that the number of values for sh:result can vary between SHACL Core
>>>>>> implementations for the same validation.
>>>>> This is not the intention of the spec. The spec states that each validator
>>>>> must have a mode in which it always produces all results.
>>>>>
>>>>> SHACL-compliant processors /must/ be capable of returning a validation
>>>>> report
>>>>> with all required validation results
>>>>> <http://w3c.github.io/data-shapes/shacl/#dfn-validation-results>described in
>>>>> this specification.
>>>> Consider the validating the data graph
>>>>     ex:i ex:p ex:j ; ex:q ex:j .
>>>>     ex:j ex:p ex:j .
>>>> against the shapes graph
>>>>     ex:s1 rdf:type sh:PropertyShape ;
>>>>       sh:targetNode ex:i ;
>>>>       sh:property [ sh:path ex:p ; sh:property ex:s2 ] ;
>>>>       sh:property [ sh:path ex:q ; sh:property ex:s2 ] .
>>>>     ex:s2 sh:path ex:p ; sh:class ex:C .
>>>> It is reasonable and acceptable to have one top-level validation result
>>>> here.  It seems to me that there is an argument that it is also reasonable
>>>> and acceptable to have two top-level validation results here.
>>> The intention, and what I believe the current spec states, is that two results
>>> must be produced in this case - the intro to section 4 states that it always
>>> has to produce new result nodes and these cannot be shared. Also the
>>> validation is defined per-focus-node and not for a group of focus nodes (which
>>> may indeed cause duplicate value nodes to be swallowed up). So if ex:s2 for
>>> ex:j is reached by two property shapes, it will produce one result for each
>>> original focus node.
>> Not so.  Even with the wording about producing new results a SPARQL
>> implementation is free to optimize its performance.  For example, the
>> implementation may decide to cache results of validation.  This can result
>> in fewer validations being performed.  The results of these validations can
>> then be used multiple times and then show up in the validation report.
> 
> I believe this aspect of your email is now superseded by your "testing
> methodology" email, so we can hopefully close this thread here.
> 
> Holger
> 
> 
>>
>>> Holger
>> peter
> 
>
Received on Wednesday, 15 March 2017 11:42:34 UTC