Re: central problem with test suite from Peter F. Patel-Schneider on 2017-03-14 (public-rdf-shapes@w3.org from March 2017)

From: Peter F. Patel-Schneider <pfpschneider@gmail.com>
Date: Tue, 14 Mar 2017 05:49:14 -0700
To: Holger Knublauch <holger@topquadrant.com>, "public-rdf-shapes@w3.org" <public-rdf-shapes@w3.org>
Message-ID: <b85a15ab-f7fe-b36a-554e-a9a4a112c13e@gmail.com>
I was too quick in agreeing that term equality is adequate for testing the
boolean values that show up in validation reports.  If a data graph does not
validate against a shapes graph the value for sh:conforms can be any boolean
value except "true"^^xsd:boolean.  So it can be "false"^^xsd:boolean or
"0"^^xsd:boolean.  This causes problems for graph isomorphism.

The value can also be "1"^^xsd:boolean if the data graph does not conform to
the shapes graph, which seems odd.  The value can also be "a"^^xsd:boolean,
which also seems odd.  This appears to be a problem not with testing but with
the definition of validation reports.

Similar problems occur in other places.  For example, a value of
"1"^^xsd:boolean for sh:uniqueLang or sh:qualifiedValueShapesDisjoint or
sh:closed does not enable the feature.

The SHACL document still needs a close critical examination to detect these
kinds of problems.

peter


On 03/14/2017 04:58 AM, Peter F. Patel-Schneider wrote:
> On 03/13/2017 10:39 PM, Holger Knublauch wrote:
>>
>>
>> On 13/03/2017 21:22, Peter F. Patel-Schneider wrote:
>>> On 03/12/2017 04:48 PM, Holger Knublauch wrote:
>>>> The test suit document is work in progress and I have basically just started
>>>> to take a deeper look. I welcome any help on this and really don't want to
>>>> "own" this document.
>>>>
>>>> On 12/03/2017 6:02, Peter F. Patel-Schneider wrote:
>>>>> It's going to be hard.  It's not possible to just remove the parts of the
>>>>> validation report that can vary because some of these parts have
>>>>> conditions on
>>>>> them.  For example, removing type and subclass triples will prevent checking
>>>>> the SHACL instance requirements.
>>>> Ok, the fact that reports allow for instances of subclasses of
>>>> sh:ValidationReport and sh:ValidationResult indeed requires an extra
>>>> pre-processing step. I have now added this step, normalizing these to their
>>>> direct rdf:type.
>>>>
>>>>>    There is also the problem that there are
>>>>> different RDF literals with the same value.
>>>> Why is this a problem? I believe RDF ismorphism relies on term equality:
>>>>
>>>> https://www.w3.org/TR/rdf11-concepts/#graph-isomorphism
>>>> https://www.w3.org/TR/rdf11-concepts/#dfn-literal-term-equality
>>> Just following the second link here shows that RDF term equality looks at
>>> the syntactic form of RDF literals, not their value.  There is even a very
>>> illustrative example provided.
>>>
>>> https://www.w3.org/TR/rdf11-concepts/#dfn-literal-term-equality
>>> *******************
>>> Literal term equality: Two literals are term-equal (the same RDF literal) if
>>> and only if the two lexical forms, the two datatype IRIs, and the two
>>> language tags (if any) compare equal, character by character. Thus, two
>>> literals can have the same value without being the same RDF term. For
>>> example:
>>>        "1"^^xs:integer
>>>        "01"^^xs:integer
>>> denote the same value, but are not the same literal RDF terms and are not
>>> term-equal because their lexical form differs.
>>> *******************
>>
>> That's understood, but I believe term equality is what we want, not value
>> equality. AFAICS all of the properties in the results vocabulary (e.g.
>> sh:focusNode, sh:resultPath, sh:sourceShape) can only have precisely matching
>> values. The only times where they can be literals such as "1" vs "01" is if
>> they point at values from the data graph via sh:value, and in those cases we
>> are doing term equality too. So I don't see the problem that you seem to see
>> right now.
> 
> Yes, you are right.  The SHACL document defined true in red as a particular
> RDF term and uses true in red throughout where it talks about validation
> results.  My fault for not looking closely enough and assuming that true in
> SHACL validation reports could be any RDF term whose RDF value is true.
> 
>>>>>    Probably the biggest problem is
>>>>> that the number of values for sh:result can vary between SHACL Core
>>>>> implementations for the same validation.
>>>> This is not the intention of the spec. The spec states that each validator
>>>> must have a mode in which it always produces all results.
>>>>
>>>> SHACL-compliant processors /must/ be capable of returning a validation report
>>>> with all required validation results
>>>> <http://w3c.github.io/data-shapes/shacl/#dfn-validation-results>described in
>>>> this specification.
>>> Consider the validating the data graph
>>>    ex:i ex:p ex:j ; ex:q ex:j .
>>>    ex:j ex:p ex:j .
>>> against the shapes graph
>>>    ex:s1 rdf:type sh:PropertyShape ;
>>>      sh:targetNode ex:i ;
>>>      sh:property [ sh:path ex:p ; sh:property ex:s2 ] ;
>>>      sh:property [ sh:path ex:q ; sh:property ex:s2 ] .
>>>    ex:s2 sh:path ex:p ; sh:class ex:C .
>>> It is reasonable and acceptable to have one top-level validation result
>>> here.  It seems to me that there is an argument that it is also reasonable
>>> and acceptable to have two top-level validation results here.
>>
>> The intention, and what I believe the current spec states, is that two results
>> must be produced in this case - the intro to section 4 states that it always
>> has to produce new result nodes and these cannot be shared. Also the
>> validation is defined per-focus-node and not for a group of focus nodes (which
>> may indeed cause duplicate value nodes to be swallowed up). So if ex:s2 for
>> ex:j is reached by two property shapes, it will produce one result for each
>> original focus node.
> 
> Not so.  Even with the wording about producing new results a SPARQL
> implementation is free to optimize its performance.  For example, the
> implementation may decide to cache results of validation.  This can result
> in fewer validations being performed.  The results of these validations can
> then be used multiple times and then show up in the validation report.
> 
>> Holger
> 
> peter
>
Received on Tuesday, 14 March 2017 12:49:49 UTC