Re: central problem with test suite from Peter F. Patel-Schneider on 2017-03-15 (public-rdf-shapes@w3.org from March 2017)

From: Peter F. Patel-Schneider <pfpschneider@gmail.com>
Date: Wed, 15 Mar 2017 04:40:29 -0700
To: Holger Knublauch <holger@topquadrant.com>, "public-rdf-shapes@w3.org" <public-rdf-shapes@w3.org>
Message-ID: <0a713be0-21be-f4d9-d798-ebfa0146f83e@gmail.com>
On 03/14/2017 08:25 PM, Holger Knublauch wrote:
> 
> 
> On 14/03/2017 22:49, Peter F. Patel-Schneider wrote:
>> I was too quick in agreeing that term equality is adequate for testing the
>> boolean values that show up in validation reports.  If a data graph does not
>> validate against a shapes graph the value for sh:conforms can be any boolean
>> value except "true"^^xsd:boolean.  So it can be "false"^^xsd:boolean or
>> "0"^^xsd:boolean.  This causes problems for graph isomorphism.
>>
>> The value can also be "1"^^xsd:boolean if the data graph does not conform to
>> the shapes graph, which seems odd.  The value can also be "a"^^xsd:boolean,
>> which also seems odd.  This appears to be a problem not with testing but with
>> the definition of validation reports.
> 
> I have tightened the definition of sh:conforms to be always either true or
> false. This also resolves this part of the graph comparison problem.
> 
>>
>> Similar problems occur in other places.  For example, a value of
>> "1"^^xsd:boolean for sh:uniqueLang or sh:qualifiedValueShapesDisjoint or
>> sh:closed does not enable the feature.
> 
> It is IMHO unfortunate that RDF even allows 0 or 1 for booleans. Luckily this
> fact is barely known and hardly ever used in practice (although I confess I
> did bump into it recently with a JavaScript library, probably the only time
> ever in the last 10 years).

This aspect of RDF comes from XML Schema Datatypes.

> Since I don't want to unnecessarily complicate the language and add to
> implementation costs, I believe the definitions of sh:uniqueLang and
> sh:qualifiedValuesShapesDisjoint are OK as they are right now. IMHO we
> shouldn't encourage the use of "1"^^xsd:boolean further. If anyone has strong
> feelings otherwise, please file a ticket to bring it in front of the WG.

There will have to be tests showing that "1"^^xsd:boolean does not trigger
these constraints.

I would prefer these constraints triggering on value not on terms, but it's
not particularly important for me.  I do note that most of the other usage of
literals in SHACL syntax does use the value of literals, so the usages above
are unusual within SHACL and actually produce a conceptual complication of the
language of SHACL.

> Holger

peter

> 
>>
>> The SHACL document still needs a close critical examination to detect these
>> kinds of problems.
>>
>> peter
>>
>>
>> On 03/14/2017 04:58 AM, Peter F. Patel-Schneider wrote:
>>> On 03/13/2017 10:39 PM, Holger Knublauch wrote:
>>>>
>>>> On 13/03/2017 21:22, Peter F. Patel-Schneider wrote:
>>>>> On 03/12/2017 04:48 PM, Holger Knublauch wrote:
>>>>>> The test suit document is work in progress and I have basically just
>>>>>> started
>>>>>> to take a deeper look. I welcome any help on this and really don't want to
>>>>>> "own" this document.
>>>>>>
>>>>>> On 12/03/2017 6:02, Peter F. Patel-Schneider wrote:
>>>>>>> It's going to be hard.  It's not possible to just remove the parts of the
>>>>>>> validation report that can vary because some of these parts have
>>>>>>> conditions on
>>>>>>> them.  For example, removing type and subclass triples will prevent
>>>>>>> checking
>>>>>>> the SHACL instance requirements.
>>>>>> Ok, the fact that reports allow for instances of subclasses of
>>>>>> sh:ValidationReport and sh:ValidationResult indeed requires an extra
>>>>>> pre-processing step. I have now added this step, normalizing these to their
>>>>>> direct rdf:type.
>>>>>>
>>>>>>>     There is also the problem that there are
>>>>>>> different RDF literals with the same value.
>>>>>> Why is this a problem? I believe RDF ismorphism relies on term equality:
>>>>>>
>>>>>> https://www.w3.org/TR/rdf11-concepts/#graph-isomorphism
>>>>>> https://www.w3.org/TR/rdf11-concepts/#dfn-literal-term-equality
>>>>> Just following the second link here shows that RDF term equality looks at
>>>>> the syntactic form of RDF literals, not their value.  There is even a very
>>>>> illustrative example provided.
>>>>>
>>>>> https://www.w3.org/TR/rdf11-concepts/#dfn-literal-term-equality
>>>>> *******************
>>>>> Literal term equality: Two literals are term-equal (the same RDF literal) if
>>>>> and only if the two lexical forms, the two datatype IRIs, and the two
>>>>> language tags (if any) compare equal, character by character. Thus, two
>>>>> literals can have the same value without being the same RDF term. For
>>>>> example:
>>>>>         "1"^^xs:integer
>>>>>         "01"^^xs:integer
>>>>> denote the same value, but are not the same literal RDF terms and are not
>>>>> term-equal because their lexical form differs.
>>>>> *******************
>>>> That's understood, but I believe term equality is what we want, not value
>>>> equality. AFAICS all of the properties in the results vocabulary (e.g.
>>>> sh:focusNode, sh:resultPath, sh:sourceShape) can only have precisely matching
>>>> values. The only times where they can be literals such as "1" vs "01" is if
>>>> they point at values from the data graph via sh:value, and in those cases we
>>>> are doing term equality too. So I don't see the problem that you seem to see
>>>> right now.
>>> Yes, you are right.  The SHACL document defined true in red as a particular
>>> RDF term and uses true in red throughout where it talks about validation
>>> results.  My fault for not looking closely enough and assuming that true in
>>> SHACL validation reports could be any RDF term whose RDF value is true.
>>>
>>>>>>>     Probably the biggest problem is
>>>>>>> that the number of values for sh:result can vary between SHACL Core
>>>>>>> implementations for the same validation.
>>>>>> This is not the intention of the spec. The spec states that each validator
>>>>>> must have a mode in which it always produces all results.
>>>>>>
>>>>>> SHACL-compliant processors /must/ be capable of returning a validation
>>>>>> report
>>>>>> with all required validation results
>>>>>> <http://w3c.github.io/data-shapes/shacl/#dfn-validation-results>described in
>>>>>>
>>>>>> this specification.
>>>>> Consider the validating the data graph
>>>>>     ex:i ex:p ex:j ; ex:q ex:j .
>>>>>     ex:j ex:p ex:j .
>>>>> against the shapes graph
>>>>>     ex:s1 rdf:type sh:PropertyShape ;
>>>>>       sh:targetNode ex:i ;
>>>>>       sh:property [ sh:path ex:p ; sh:property ex:s2 ] ;
>>>>>       sh:property [ sh:path ex:q ; sh:property ex:s2 ] .
>>>>>     ex:s2 sh:path ex:p ; sh:class ex:C .
>>>>> It is reasonable and acceptable to have one top-level validation result
>>>>> here.  It seems to me that there is an argument that it is also reasonable
>>>>> and acceptable to have two top-level validation results here.
>>>> The intention, and what I believe the current spec states, is that two
>>>> results
>>>> must be produced in this case - the intro to section 4 states that it always
>>>> has to produce new result nodes and these cannot be shared. Also the
>>>> validation is defined per-focus-node and not for a group of focus nodes
>>>> (which
>>>> may indeed cause duplicate value nodes to be swallowed up). So if ex:s2 for
>>>> ex:j is reached by two property shapes, it will produce one result for each
>>>> original focus node.
>>> Not so.  Even with the wording about producing new results a SPARQL
>>> implementation is free to optimize its performance.  For example, the
>>> implementation may decide to cache results of validation.  This can result
>>> in fewer validations being performed.  The results of these validations can
>>> then be used multiple times and then show up in the validation report.
>>>
>>>> Holger
>>> peter
>>>
> 
>
Received on Wednesday, 15 March 2017 11:41:05 UTC