- From: Peter F. Patel-Schneider <pfpschneider@gmail.com>
- Date: Wed, 15 Mar 2017 04:40:29 -0700
- To: Holger Knublauch <holger@topquadrant.com>, "public-rdf-shapes@w3.org" <public-rdf-shapes@w3.org>
On 03/14/2017 08:25 PM, Holger Knublauch wrote: > > > On 14/03/2017 22:49, Peter F. Patel-Schneider wrote: >> I was too quick in agreeing that term equality is adequate for testing the >> boolean values that show up in validation reports. If a data graph does not >> validate against a shapes graph the value for sh:conforms can be any boolean >> value except "true"^^xsd:boolean. So it can be "false"^^xsd:boolean or >> "0"^^xsd:boolean. This causes problems for graph isomorphism. >> >> The value can also be "1"^^xsd:boolean if the data graph does not conform to >> the shapes graph, which seems odd. The value can also be "a"^^xsd:boolean, >> which also seems odd. This appears to be a problem not with testing but with >> the definition of validation reports. > > I have tightened the definition of sh:conforms to be always either true or > false. This also resolves this part of the graph comparison problem. > >> >> Similar problems occur in other places. For example, a value of >> "1"^^xsd:boolean for sh:uniqueLang or sh:qualifiedValueShapesDisjoint or >> sh:closed does not enable the feature. > > It is IMHO unfortunate that RDF even allows 0 or 1 for booleans. Luckily this > fact is barely known and hardly ever used in practice (although I confess I > did bump into it recently with a JavaScript library, probably the only time > ever in the last 10 years). This aspect of RDF comes from XML Schema Datatypes. > Since I don't want to unnecessarily complicate the language and add to > implementation costs, I believe the definitions of sh:uniqueLang and > sh:qualifiedValuesShapesDisjoint are OK as they are right now. IMHO we > shouldn't encourage the use of "1"^^xsd:boolean further. If anyone has strong > feelings otherwise, please file a ticket to bring it in front of the WG. There will have to be tests showing that "1"^^xsd:boolean does not trigger these constraints. I would prefer these constraints triggering on value not on terms, but it's not particularly important for me. I do note that most of the other usage of literals in SHACL syntax does use the value of literals, so the usages above are unusual within SHACL and actually produce a conceptual complication of the language of SHACL. > Holger peter > >> >> The SHACL document still needs a close critical examination to detect these >> kinds of problems. >> >> peter >> >> >> On 03/14/2017 04:58 AM, Peter F. Patel-Schneider wrote: >>> On 03/13/2017 10:39 PM, Holger Knublauch wrote: >>>> >>>> On 13/03/2017 21:22, Peter F. Patel-Schneider wrote: >>>>> On 03/12/2017 04:48 PM, Holger Knublauch wrote: >>>>>> The test suit document is work in progress and I have basically just >>>>>> started >>>>>> to take a deeper look. I welcome any help on this and really don't want to >>>>>> "own" this document. >>>>>> >>>>>> On 12/03/2017 6:02, Peter F. Patel-Schneider wrote: >>>>>>> It's going to be hard. It's not possible to just remove the parts of the >>>>>>> validation report that can vary because some of these parts have >>>>>>> conditions on >>>>>>> them. For example, removing type and subclass triples will prevent >>>>>>> checking >>>>>>> the SHACL instance requirements. >>>>>> Ok, the fact that reports allow for instances of subclasses of >>>>>> sh:ValidationReport and sh:ValidationResult indeed requires an extra >>>>>> pre-processing step. I have now added this step, normalizing these to their >>>>>> direct rdf:type. >>>>>> >>>>>>> There is also the problem that there are >>>>>>> different RDF literals with the same value. >>>>>> Why is this a problem? I believe RDF ismorphism relies on term equality: >>>>>> >>>>>> https://www.w3.org/TR/rdf11-concepts/#graph-isomorphism >>>>>> https://www.w3.org/TR/rdf11-concepts/#dfn-literal-term-equality >>>>> Just following the second link here shows that RDF term equality looks at >>>>> the syntactic form of RDF literals, not their value. There is even a very >>>>> illustrative example provided. >>>>> >>>>> https://www.w3.org/TR/rdf11-concepts/#dfn-literal-term-equality >>>>> ******************* >>>>> Literal term equality: Two literals are term-equal (the same RDF literal) if >>>>> and only if the two lexical forms, the two datatype IRIs, and the two >>>>> language tags (if any) compare equal, character by character. Thus, two >>>>> literals can have the same value without being the same RDF term. For >>>>> example: >>>>> "1"^^xs:integer >>>>> "01"^^xs:integer >>>>> denote the same value, but are not the same literal RDF terms and are not >>>>> term-equal because their lexical form differs. >>>>> ******************* >>>> That's understood, but I believe term equality is what we want, not value >>>> equality. AFAICS all of the properties in the results vocabulary (e.g. >>>> sh:focusNode, sh:resultPath, sh:sourceShape) can only have precisely matching >>>> values. The only times where they can be literals such as "1" vs "01" is if >>>> they point at values from the data graph via sh:value, and in those cases we >>>> are doing term equality too. So I don't see the problem that you seem to see >>>> right now. >>> Yes, you are right. The SHACL document defined true in red as a particular >>> RDF term and uses true in red throughout where it talks about validation >>> results. My fault for not looking closely enough and assuming that true in >>> SHACL validation reports could be any RDF term whose RDF value is true. >>> >>>>>>> Probably the biggest problem is >>>>>>> that the number of values for sh:result can vary between SHACL Core >>>>>>> implementations for the same validation. >>>>>> This is not the intention of the spec. The spec states that each validator >>>>>> must have a mode in which it always produces all results. >>>>>> >>>>>> SHACL-compliant processors /must/ be capable of returning a validation >>>>>> report >>>>>> with all required validation results >>>>>> <http://w3c.github.io/data-shapes/shacl/#dfn-validation-results>described in >>>>>> >>>>>> this specification. >>>>> Consider the validating the data graph >>>>> ex:i ex:p ex:j ; ex:q ex:j . >>>>> ex:j ex:p ex:j . >>>>> against the shapes graph >>>>> ex:s1 rdf:type sh:PropertyShape ; >>>>> sh:targetNode ex:i ; >>>>> sh:property [ sh:path ex:p ; sh:property ex:s2 ] ; >>>>> sh:property [ sh:path ex:q ; sh:property ex:s2 ] . >>>>> ex:s2 sh:path ex:p ; sh:class ex:C . >>>>> It is reasonable and acceptable to have one top-level validation result >>>>> here. It seems to me that there is an argument that it is also reasonable >>>>> and acceptable to have two top-level validation results here. >>>> The intention, and what I believe the current spec states, is that two >>>> results >>>> must be produced in this case - the intro to section 4 states that it always >>>> has to produce new result nodes and these cannot be shared. Also the >>>> validation is defined per-focus-node and not for a group of focus nodes >>>> (which >>>> may indeed cause duplicate value nodes to be swallowed up). So if ex:s2 for >>>> ex:j is reached by two property shapes, it will produce one result for each >>>> original focus node. >>> Not so. Even with the wording about producing new results a SPARQL >>> implementation is free to optimize its performance. For example, the >>> implementation may decide to cache results of validation. This can result >>> in fewer validations being performed. The results of these validations can >>> then be used multiple times and then show up in the validation report. >>> >>>> Holger >>> peter >>> > >
Received on Wednesday, 15 March 2017 11:41:05 UTC