Re: central problem with test suite from Holger Knublauch on 2017-03-15 (public-rdf-shapes@w3.org from March 2017)

From: Holger Knublauch <holger@topquadrant.com>
Date: Wed, 15 Mar 2017 13:25:29 +1000
To: "public-rdf-shapes@w3.org" <public-rdf-shapes@w3.org>
Message-ID: <933dfc6d-b514-1669-a396-deceafc63caf@topquadrant.com>
On 14/03/2017 22:49, Peter F. Patel-Schneider wrote:
> I was too quick in agreeing that term equality is adequate for testing the
> boolean values that show up in validation reports.  If a data graph does not
> validate against a shapes graph the value for sh:conforms can be any boolean
> value except "true"^^xsd:boolean.  So it can be "false"^^xsd:boolean or
> "0"^^xsd:boolean.  This causes problems for graph isomorphism.
>
> The value can also be "1"^^xsd:boolean if the data graph does not conform to
> the shapes graph, which seems odd.  The value can also be "a"^^xsd:boolean,
> which also seems odd.  This appears to be a problem not with testing but with
> the definition of validation reports.

I have tightened the definition of sh:conforms to be always either true 
or false. This also resolves this part of the graph comparison problem.

>
> Similar problems occur in other places.  For example, a value of
> "1"^^xsd:boolean for sh:uniqueLang or sh:qualifiedValueShapesDisjoint or
> sh:closed does not enable the feature.

It is IMHO unfortunate that RDF even allows 0 or 1 for booleans. Luckily 
this fact is barely known and hardly ever used in practice (although I 
confess I did bump into it recently with a JavaScript library, probably 
the only time ever in the last 10 years).

Since I don't want to unnecessarily complicate the language and add to 
implementation costs, I believe the definitions of sh:uniqueLang and 
sh:qualifiedValuesShapesDisjoint are OK as they are right now. IMHO we 
shouldn't encourage the use of "1"^^xsd:boolean further. If anyone has 
strong feelings otherwise, please file a ticket to bring it in front of 
the WG.

Holger



>
> The SHACL document still needs a close critical examination to detect these
> kinds of problems.
>
> peter
>
>
> On 03/14/2017 04:58 AM, Peter F. Patel-Schneider wrote:
>> On 03/13/2017 10:39 PM, Holger Knublauch wrote:
>>>
>>> On 13/03/2017 21:22, Peter F. Patel-Schneider wrote:
>>>> On 03/12/2017 04:48 PM, Holger Knublauch wrote:
>>>>> The test suit document is work in progress and I have basically just started
>>>>> to take a deeper look. I welcome any help on this and really don't want to
>>>>> "own" this document.
>>>>>
>>>>> On 12/03/2017 6:02, Peter F. Patel-Schneider wrote:
>>>>>> It's going to be hard.  It's not possible to just remove the parts of the
>>>>>> validation report that can vary because some of these parts have
>>>>>> conditions on
>>>>>> them.  For example, removing type and subclass triples will prevent checking
>>>>>> the SHACL instance requirements.
>>>>> Ok, the fact that reports allow for instances of subclasses of
>>>>> sh:ValidationReport and sh:ValidationResult indeed requires an extra
>>>>> pre-processing step. I have now added this step, normalizing these to their
>>>>> direct rdf:type.
>>>>>
>>>>>>     There is also the problem that there are
>>>>>> different RDF literals with the same value.
>>>>> Why is this a problem? I believe RDF ismorphism relies on term equality:
>>>>>
>>>>> https://www.w3.org/TR/rdf11-concepts/#graph-isomorphism
>>>>> https://www.w3.org/TR/rdf11-concepts/#dfn-literal-term-equality
>>>> Just following the second link here shows that RDF term equality looks at
>>>> the syntactic form of RDF literals, not their value.  There is even a very
>>>> illustrative example provided.
>>>>
>>>> https://www.w3.org/TR/rdf11-concepts/#dfn-literal-term-equality
>>>> *******************
>>>> Literal term equality: Two literals are term-equal (the same RDF literal) if
>>>> and only if the two lexical forms, the two datatype IRIs, and the two
>>>> language tags (if any) compare equal, character by character. Thus, two
>>>> literals can have the same value without being the same RDF term. For
>>>> example:
>>>>         "1"^^xs:integer
>>>>         "01"^^xs:integer
>>>> denote the same value, but are not the same literal RDF terms and are not
>>>> term-equal because their lexical form differs.
>>>> *******************
>>> That's understood, but I believe term equality is what we want, not value
>>> equality. AFAICS all of the properties in the results vocabulary (e.g.
>>> sh:focusNode, sh:resultPath, sh:sourceShape) can only have precisely matching
>>> values. The only times where they can be literals such as "1" vs "01" is if
>>> they point at values from the data graph via sh:value, and in those cases we
>>> are doing term equality too. So I don't see the problem that you seem to see
>>> right now.
>> Yes, you are right.  The SHACL document defined true in red as a particular
>> RDF term and uses true in red throughout where it talks about validation
>> results.  My fault for not looking closely enough and assuming that true in
>> SHACL validation reports could be any RDF term whose RDF value is true.
>>
>>>>>>     Probably the biggest problem is
>>>>>> that the number of values for sh:result can vary between SHACL Core
>>>>>> implementations for the same validation.
>>>>> This is not the intention of the spec. The spec states that each validator
>>>>> must have a mode in which it always produces all results.
>>>>>
>>>>> SHACL-compliant processors /must/ be capable of returning a validation report
>>>>> with all required validation results
>>>>> <http://w3c.github.io/data-shapes/shacl/#dfn-validation-results>described in
>>>>> this specification.
>>>> Consider the validating the data graph
>>>>     ex:i ex:p ex:j ; ex:q ex:j .
>>>>     ex:j ex:p ex:j .
>>>> against the shapes graph
>>>>     ex:s1 rdf:type sh:PropertyShape ;
>>>>       sh:targetNode ex:i ;
>>>>       sh:property [ sh:path ex:p ; sh:property ex:s2 ] ;
>>>>       sh:property [ sh:path ex:q ; sh:property ex:s2 ] .
>>>>     ex:s2 sh:path ex:p ; sh:class ex:C .
>>>> It is reasonable and acceptable to have one top-level validation result
>>>> here.  It seems to me that there is an argument that it is also reasonable
>>>> and acceptable to have two top-level validation results here.
>>> The intention, and what I believe the current spec states, is that two results
>>> must be produced in this case - the intro to section 4 states that it always
>>> has to produce new result nodes and these cannot be shared. Also the
>>> validation is defined per-focus-node and not for a group of focus nodes (which
>>> may indeed cause duplicate value nodes to be swallowed up). So if ex:s2 for
>>> ex:j is reached by two property shapes, it will produce one result for each
>>> original focus node.
>> Not so.  Even with the wording about producing new results a SPARQL
>> implementation is free to optimize its performance.  For example, the
>> implementation may decide to cache results of validation.  This can result
>> in fewer validations being performed.  The results of these validations can
>> then be used multiple times and then show up in the validation report.
>>
>>> Holger
>> peter
>>
Received on Wednesday, 15 March 2017 03:26:04 UTC