Re: Understanding Node vs Property Shapes and Property Paths

Hello Irene,

Mostly because I hate dangling threads with no resolution...

Again, the SO question (
https://stackoverflow.com/questions/61323857/what-is-the-difference-between-these-shape-graphs-which-use-shor/61372130#61372130
) mentioned earlier provides useful background.

The key blocker I had was understanding how property paths work and keeping
the concept firmly in mind. A path is used to reach values. When using
sh:path [sh:zeroOrMorePath rdf:path] and considering the node hr:Longer, it
will reach three values -- (0) hr:Longer, (1) hr:Employee, and (2)
rdfs:Class. With this concept firmly in mind, what is going on in (B) and
why it does not work can be fully explained.

Both (A) and (B) have the same target definition and will return the same
focus nodes. These are:

hr:Another
hr:Employee
hr:Longer
hr:freestanding
hr:missing
hr:name
hr:nosuper
hr:randomtype
hr:typo


Additionally, common to both (A) and (B) is sh:path [sh:zeroOrMorePath
rdf:type] ;. When considering the node hr:Longer, for example, it will emit
three values, each of which may need to be checked. These three values are
(0) hr:Longer, (1) hr:Employee, and (2) rdfs:Class.

For (B), when it considers hr:Longer and passes emitted value hr:Longer to
sh:or, it see that it is not either a rdfs:Class or rdf:Property. A
validation error is emitted because neither clause of sh:or was satisfied.

To make (B) work, the two clauses in the sh:or need to be changed to [
sh:path [sh:zeroOrMorePath rdf:type] ; sh:hasValue rdfs:Class;   ] and [
sh:path [sh:zeroOrMorePath rdf:type] ; sh:hasValue rdf:Property; ]. In this
case, when hr:Longer is passed into the sh:or, each clause checks the
entire path and sh:hasValue only requires that one of the three values
emitted by the path matches.

(B) - Working

@prefix rdfs: <http://www.w3.org/2000/01/rdf-schema#> .
@prefix rdf:  <http://www.w3.org/1999/02/22-rdf-syntax-ns#> .
@prefix sch:  <http://schema.org/> .
@prefix sh:   <http://www.w3.org/ns/shacl#> .
@prefix ex:  <http://example.org/> .

ex:ClassOrProperty
    a sh:NodeShape ;
    sh:target [
        a sh:SPARQLTarget ;
        sh:select   """
                    SELECT ?this
                    WHERE {
                        ?this ?p ?o .
                    }
                    """ ;
    ] ;

    sh:property [
        sh:path     [sh:zeroOrMorePath rdf:type] ;
        sh:nodeKind sh:IRI ;
        sh:or (
            [ sh:path [sh:zeroOrMorePath rdf:type] ; sh:hasValue
rdfs:Class;   ]
            [ sh:path [sh:zeroOrMorePath rdf:type] ; sh:hasValue
rdf:Property; ]
        )
    ];
.


Now considering (A), each focus node is passed to ex:PropertyShape and
ex:ClassShape. If it validates against one of the shapes, it will validate.
Both shapes are similar in that they each use the path sh:path [
sh:zeroOrMorePath rdf:type ];. Because they use sh:hasValue, only one of
the emitted values for the path needs to match. Considering hr:Longer
again, because the path will emit the value rdfs:Class, it validates
against ex:ClassShape and no validation error is generated.

Regards,
James

On Tue, Apr 21, 2020 at 4:59 PM James Hudson <jameshudson3010@gmail.com>
wrote:

> Hello Irene,
>
> On Tue, Apr 21, 2020 at 3:59 PM Irene Polikoff <irene@topquadrant.com>
> wrote:
>
>>
>>
>> On Apr 21, 2020, at 3:37 PM, James Hudson <jameshudson3010@gmail.com>
>> wrote:
>>
>> The SPARQL query which does what I want is:
>>
>> SELECT DISTINCT ?s
>> WHERE {
>>     {
>>         ?s ?p ?o .
>>         FILTER NOT EXISTS {
>>             ?s rdf:type* ?c .
>>              FILTER(?c IN (rdfs:Class, rdf:Property) && ?s NOT IN
>> (rdfs:Class, rdf:Property) )
>>         }
>>     }
>> }
>>
>>
>> You can do all sort of things with shapes including complex expressions
>> with sh:or, sh:not, etc.
>>
>
> My question is how...? That is what I am not yet seeing clearly yet with
> SHACL, where I think I understand what SPARQL is doing and how.
>
>
>
>> This query will return every subject whose rdf:type property path does
>> not terminate in a rdfs:Class or rdf:Property. The subjects the SPARQL
>> query will return are hr:missing, hr:typo, and hr:randomtype.
>>
>> 1. You require every resource that is a subject of any triples to have
>> rdf:type that is either a class or a property. In which case hr:missing is
>> correctly identified as missing a type
>>
>>
>> This is not what I want because hr:Longer would not validate.
>>
>> There is no triple hr:Longer a rdfs:Class. There is hr:Longer a
>> hr:Employee and hr:Employee a rdfs:Class. Therefore, there is a rdf:type
>> property path which terminates in either a rdfs:Class or rdf:Property and
>> hr:Longer is valid.
>>
>>
>> You lost me here. Possibly, this requires more bandwidth that I have to
>> discuss in e-mails
>>
>
> This is my train of thought. If I execute the following SPARQL query to
> extract all of the triples defined in my data graph, I get the following
> set of triples:
>
> SELECT ?s ?p ?o
> WHERE {
>     ?s ?p ?o .
> }
>
>
> -----------------------------------------------------------------------------------------------------------------------------------------------
> | s                                                          | p
>        | o                                                         |
>
> ===============================================================================================================================================
> | <http://learningsparql.com/ns/humanResources#freestanding> |
> sch:rangeIncludes  | sch:Text
>    |
> | <http://learningsparql.com/ns/humanResources#freestanding> | rdf:type
>         | rdf:Property                                              |
> | <http://learningsparql.com/ns/humanResources#Another>      | rdf:type
>         | rdfs:Class                                                |
> | <http://learningsparql.com/ns/humanResources#typo>         |
> rdfs:comment       | "some comment about typo"
>     |
> | <http://learningsparql.com/ns/humanResources#typo>         | rdf:type
>         | rdfs:Classs                                               |
> | <http://learningsparql.com/ns/humanResources#typo>         | rdfs:label
>         | "some label about typo"                                   |
> | <http://learningsparql.com/ns/humanResources#nosuper>      |
> sch:rangeIncludes  | sch:Text
>    |
> | <http://learningsparql.com/ns/humanResources#nosuper>      | rdf:type
>         | rdf:Property                                              |
> | <http://learningsparql.com/ns/humanResources#nosuper>      |
> sch:domainIncludes | <
> http://learningsparql.com/ns/humanResources#Uncreated>   |
> | <http://learningsparql.com/ns/humanResources#Employee>     |
> rdfs:comment       | "a good employee"
>     |
> | <http://learningsparql.com/ns/humanResources#Employee>     | rdf:type
>         | rdfs:Class                                                |
> | <http://learningsparql.com/ns/humanResources#Employee>     | rdfs:label
>         | "model"                                                   |
> | <http://learningsparql.com/ns/humanResources#randomtype>   |
> rdfs:comment       | "some comment about randomtype"
>     |
> | <http://learningsparql.com/ns/humanResources#randomtype>   | rdf:type
>         | <http://learningsparql.com/ns/humanResources#invalidtype> |
> | <http://learningsparql.com/ns/humanResources#randomtype>   | rdfs:label
>         | "some label about randomtype"                             |
> | <http://learningsparql.com/ns/humanResources#Longer>       |
> rdfs:comment       | "a good employee"
>     |
> | <http://learningsparql.com/ns/humanResources#Longer>       | rdf:type
>         | <http://learningsparql.com/ns/humanResources#Employee>    |
> | <http://learningsparql.com/ns/humanResources#Longer>       | rdfs:label
>         | "model"                                                   |
> | <http://learningsparql.com/ns/humanResources#missing>      |
> rdfs:comment       | "some comment about missing"
>    |
> | <http://learningsparql.com/ns/humanResources#name>         | rdf:type
>         | rdf:Property                                              |
> | <http://learningsparql.com/ns/humanResources#name>         |
> sch:domainIncludes | <http://learningsparql.com/ns/humanResources#Employee>
>    |
>
> -----------------------------------------------------------------------------------------------------------------------------------------------
>
> I do not see the triple hr:Longer a rdfs:Class in that list.
>
> I do see hr:Longer a hr:Employee and hr:Employee a rdfs:Class in the list.
>
> Following the rdf:type property path, hr:Longer -> hr:Employee ->
> rdfs:Class. The rdf:type property path starting with hr:Longer terminates
> in rdfs:Class.
>
> I am not sure how I can explain what I am thinking any better, so I hope
> that was good enough.
>
> Now, following what you said that sh:path [sh:zeroOrMorePath rdf:type]
> emits three values hr:Longer (the zero), hr:Employee (1), rdfs:Class (2),
> what I need is a way is for SHACL to only consider the last value found
> (where the rdf:type property path terminates) and ignore the rest.
>
> Put another way, perhaps (?), sh:path [sh:zeroOrMorePath rdf:type] results
> in three triples to be evaluated:
>
> (0) hr:Longer a hr:Longer (???)
> (1) hr:Longer a hr:Employee
> (2) hr:Longer a rdfs:Class
>
> I need a way to tell SHACL to ignore everything except for the last one
> (2) and to try to validate it.
>
> Regards,
> James
>
>
>> On Tue, Apr 21, 2020 at 3:12 PM Irene Polikoff <irene@topquadrant.com>
>> wrote:
>>
>>>
>>>
>>> On Apr 21, 2020, at 2:37 PM, James Hudson <jameshudson3010@gmail.com>
>>> wrote:
>>>
>>> Hello Irene,
>>>
>>> I neglected to add:
>>>
>>> Yes, the target picks every resource that is a subject of a triple. This
>>> is what I want.
>>>
>>> I also want to make sure that every subject has a property path that
>>> terminates in a rdf:type of either rdfs:Class or rdf:Property.
>>>
>>>
>>> What path? Saying that a path terminates in something does not specify a
>>> path.
>>>
>>>
>>> It is either
>>> 1. You require every resource that is a subject of any triples to have
>>> rdf:type that is either a class or a property. In which case hr:missing is
>>> correctly identified as missing a type OR
>>> 2. You require that every resource that is used as a value of rdf:type
>>> has a type that is either a class or a property. In which case, use
>>> sh:targetObjectsOf rdf:type OR
>>> 3. You want to ensure that some resources have a type that is a class or
>>> a property. It is not clear to me which resources they are. If you know
>>> what your criteria is, what resources you want to exclude and include, then
>>> it can be defined.
>>>
>>>
>>> To account for subjects like hr:missing, I believe that sh:path
>>> [sh:zeroOrMorePath rdf:type] ; is what I should be using. This does
>>> result in a validation error for hr:missing.
>>>
>>>
>>> This means
>>>
>>> ?s rdf:type+ ?type
>>>
>>> which when hr:Long is bound to ?s will return hr:Long, hr:Employee and
>>> rdfs:Class. All of these values will be validated against the constraint
>>> and hr:Long gives you an error because its type is not rdfs:Class. It is
>>> hr:Employee.
>>>
>>> SHACL paths are the same as SPARQL paths
>>> https://www.w3.org/TR/sparql11-property-paths/.
>>>
>>>
>>> However, when using sh:path [sh:zeroOrMorePath rdf:type] ;,  hr:Employee
>>> produces a validation error because SHACL does not look at what the
>>> rdf:type of hr:Employee is. I believe this is because of the zero part of
>>> sh:zeroOrMorePath. When looking at hr:Employee, it only checks to see if
>>> hr:Employee is either a rdfs:Class or rdf:Property and, because it is not,
>>> it generates a validation error.
>>>
>>> Using sh:path ( rdf:type rdf:type ) ; or sh:path  ( rdf:type
>>> [sh:zeroOrMorePath rdf:type] ) or sh:path ( rdf:type [sh:oneOrMorePath
>>> rdf:type] ) does not result in a validation error for hr:missing.
>>>
>>> Regards,
>>> James
>>>
>>>
>>> On Tue, Apr 21, 2020 at 2:11 PM Irene Polikoff <irene@topquadrant.com>
>>> wrote:
>>>
>>>> Well, your target picks every resource that is a subject of a triple. I
>>>> thought you wanted to make sure that they all have types. Since hr:missing
>>>> does not have a type, you get a violation. That seems correct to me.
>>>>
>>>> If you simply wanted to say that any object in a triple with rdf:type
>>>> predicate must itself have a type, then you do not need SPARQL based
>>>> target. You could simply use sh:targetObjectsOf rdf:type.
>>>>
>>>> On Apr 21, 2020, at 1:39 PM, James Hudson <jameshudson3010@gmail.com>
>>>> wrote:
>>>>
>>>> Hello Irene,
>>>>
>>>> Unfortunately, sh:path (rdf:type rdf:type); validates:
>>>>
>>>> hr:missing rdfs:comment "some comment about missing" .
>>>>
>>>> which does not have any value of rdf:type. This focus node should
>>>> produce a validation error.
>>>>
>>>> I also believe that I would actually want ( rdf:type [sh:oneOrMorePath
>>>> rdf:type] ) ;  as the chain could be longer then just two. However, this
>>>> does not resolve the problems.
>>>>
>>>> I tried:
>>>>
>>>> @prefix rdfs: <http://www.w3.org/2000/01/rdf-schema#> .
>>>> @prefix rdf:  <http://www.w3.org/1999/02/22-rdf-syntax-ns#> .
>>>> @prefix sch:  <http://schema.org/> .
>>>> @prefix sh:   <http://www.w3.org/ns/shacl#> .
>>>> @prefix ex:  <http://example.org/> .
>>>>
>>>> ex:ClassOrProperty
>>>>     a sh:PropertyShape ;
>>>>     sh:target [
>>>>         a sh:SPARQLTarget ;
>>>>         sh:select   """
>>>>                     SELECT ?this
>>>>                     WHERE {
>>>>                         ?this ?p ?o .
>>>>                     }
>>>>                     """ ;
>>>>     ] ;
>>>>
>>>>
>>>>     sh:path     ( rdf:type [sh:oneOrMorePath rdf:type] ) ;
>>>>     sh:in       ( rdfs:Class rdf:Property ) ;
>>>>     sh:maxCount 1 ;                 # path-maxCount
>>>>     sh:minCount 1 ;                 # PropertyShape-path-minCount
>>>>
>>>> .
>>>>
>>>>
>>>> Hoping that I could say to validate where the property path terminates
>>>> and that it has to contain at least one value found in sh:in, but this
>>>> produced the unwanted validation error:
>>>>
>>>> Constraint Violation in MinCountConstraintComponent (
>>>> http://www.w3.org/ns/shacl#MinCountConstraintComponent):
>>>> Severity: sh:Violation
>>>> Source Shape: ex:ClassOrProperty
>>>> Focus Node: hr:Employee
>>>> Result Path: ( rdf:type rdf:type )
>>>>
>>>>
>>>> The only thing I need to be able to do is to validate where the
>>>> property path terminates and that does not seem possible with SHACL. Based
>>>> on that, I have to believe that my sh:path should be sh:path
>>>> [sh:zeroOrMorePath rdf:type] ; to account for focus nodes which do not
>>>> have a rdf:type defined. Unfortunately, SHACL requires that every node
>>>> along a path be validated with the same test and cannot just validate where
>>>> the property path terminates.
>>>>
>>>> Regards,
>>>> James
>>>>
>>>>
>>>> On Tue, Apr 21, 2020 at 1:18 PM Irene Polikoff <irene@topquadrant.com>
>>>> wrote:
>>>>
>>>>> No, I meant sequence path without any zero or more or one or more.
>>>>> Simply rdf:type/rdf:type as opposed to rdf:type+/rdf:type which doesn’t
>>>>> make much sense.
>>>>>
>>>>> sh:path (rdf:type rdf:type);
>>>>>
>>>>> See https://www.w3.org/TR/shacl/#property-paths
>>>>>
>>>>> On Apr 21, 2020, at 12:56 PM, James Hudson <jameshudson3010@gmail.com>
>>>>> wrote:
>>>>>
>>>>> Hello Irene,
>>>>>
>>>>> Thank you for your quickly reply.
>>>>>
>>>>> If I try:
>>>>>
>>>>> @prefix rdfs: <http://www.w3.org/2000/01/rdf-schema#> .
>>>>> @prefix rdf:  <http://www.w3.org/1999/02/22-rdf-syntax-ns#> .
>>>>> @prefix sch:  <http://schema.org/> .
>>>>> @prefix sh:   <http://www.w3.org/ns/shacl#> .
>>>>> @prefix ex:  <http://example.org/> .
>>>>>
>>>>> ex:ClassOrProperty
>>>>>     a sh:PropertyShape ;
>>>>>     sh:target [
>>>>>         a sh:SPARQLTarget ;
>>>>>         sh:select   """
>>>>>                     SELECT ?this
>>>>>                     WHERE {
>>>>>                         ?this ?p ?o .
>>>>>                     }
>>>>>                     """ ;
>>>>>     ] ;
>>>>>
>>>>>
>>>>>     sh:path     ( [sh:zeroOrMorePath rdf:type] rdf:type ) ;
>>>>>     sh:in       ( rdfs:Class rdf:Property ) ;
>>>>> .
>>>>>
>>>>>
>>>>> which is what I think you mean by "rdf:type/rdf:type as the path", I
>>>>> still get the following unexpected validation error:
>>>>>
>>>>> Constraint Violation in InConstraintComponent (
>>>>> http://www.w3.org/ns/shacl#InConstraintComponent):
>>>>> Severity: sh:Violation
>>>>> Source Shape: ex:ClassOrProperty
>>>>> Focus Node: hr:Longer
>>>>> Value Node: hr:Employee
>>>>> Result Path: ( [ sh:zeroOrMorePath rdf:type ] rdf:type )
>>>>>
>>>>>
>>>>> By unexpected, I mean I do not want it to be considered a validation
>>>>> error because the rdf:type property path terminates at rdfs:Class.
>>>>>
>>>>> When you say "zero or more paths will deliver values hr:Long,
>>>>> hr:Employee, rdfs:Class," does that mean that the sh:in test will be
>>>>> performed on the value of hr:Long (fail), hr:Employee (fail), and
>>>>> rdfs:Class (pass)? Is it possible to have it validate only where the
>>>>> property path terminates?
>>>>>
>>>>> Regards,
>>>>> James
>>>>>
>>>>> On Tue, Apr 21, 2020 at 12:12 PM Irene Polikoff <irene@topquadrant.com>
>>>>> wrote:
>>>>>
>>>>>> This looks correct.
>>>>>>
>>>>>> With data:
>>>>>>
>>>>>> hr:Long a hr:Employee.
>>>>>> hr:Employee a rdfs:Class.
>>>>>>
>>>>>> If your focus node is hr:Long, zero or more paths will deliver values
>>>>>> hr:Long, hr:Employee, rdfs:Class. One or more paths will deliver values
>>>>>>  hr:Employee, rdfs:Class.
>>>>>>
>>>>>> You could try rdf:type/rdf:type as the path. This will get the type
>>>>>> of a resource that is used as a type and ensure that it is rdfs:CLass or
>>>>>> rdf:Property.
>>>>>>
>>>>>> On Apr 21, 2020, at 11:39 AM, James Hudson <jameshudson3010@gmail.com>
>>>>>> wrote:
>>>>>>
>>>>>> Hello,
>>>>>>
>>>>>> Since people here have been so helpful in the past, I thought I would
>>>>>> ask a few more questions.
>>>>>>
>>>>>> Background to this is my SO question at
>>>>>> https://stackoverflow.com/questions/61323857/what-is-the-difference-between-these-shape-graphs-which-use-shor
>>>>>>
>>>>>> The SO question has the data graph under consideration.
>>>>>>
>>>>>> In the book Validating RDF, it says:
>>>>>>
>>>>>> Node shapes declare constraints directly on a node. Property shapes
>>>>>> declare constraints on the values associated with a node through a path.
>>>>>>
>>>>>>
>>>>>> Based on this, I believe I want to use a Property Shape because I
>>>>>> want to define a constraint on the value of the rdf:type path on a focus
>>>>>> node. Is this correct?
>>>>>>
>>>>>> If I try the property shape:
>>>>>>
>>>>>> @prefix rdfs: <http://www.w3.org/2000/01/rdf-schema#> .
>>>>>> @prefix rdf:  <http://www.w3.org/1999/02/22-rdf-syntax-ns#> .
>>>>>> @prefix sch:  <http://schema.org/> .
>>>>>> @prefix sh:   <http://www.w3.org/ns/shacl#> .
>>>>>> @prefix ex:   <http://example.org/> .
>>>>>>
>>>>>> ex:ClassOrProperty
>>>>>>     a sh:PropertyShape ;
>>>>>>     sh:target [
>>>>>>         a sh:SPARQLTarget ;
>>>>>>         sh:select   """
>>>>>>                     SELECT ?this
>>>>>>                     WHERE {
>>>>>>                         ?this ?p ?o .
>>>>>>                     }
>>>>>>                     """ ;
>>>>>>     ] ;
>>>>>>
>>>>>>
>>>>>>     sh:path [sh:zeroOrMorePath rdf:type] ;
>>>>>>     sh:in ( rdfs:Class rdf:Property ) ;
>>>>>> .
>>>>>>
>>>>>>
>>>>>> I get the unexpected validation error:
>>>>>> (J)
>>>>>>
>>>>>> Constraint Violation in InConstraintComponent (
>>>>>> http://www.w3.org/ns/shacl#InConstraintComponent):
>>>>>> Severity: sh:Violation
>>>>>> Source Shape: ex:ClassOrProperty
>>>>>> Focus Node: hr:Longer
>>>>>> Value Node: hr:Employee
>>>>>> Result Path: [ sh:zeroOrMorePath rdf:type ]
>>>>>>
>>>>>>
>>>>>> The way I thought [sh:zeroOrMorePath rdf:type] ; would work is that
>>>>>> it would consider the node hr:Longer and follow the rdf:type path through
>>>>>> hr:Employee to where it terminates at rdfs:Class and then validate.
>>>>>> However, it seems to stop one step away, sees that hr:Employee is not a
>>>>>> rdfs:Class or rdf:Property and then generates a validation error.
>>>>>>
>>>>>> I get another unexpected validation error:
>>>>>> (K)
>>>>>>
>>>>>> Constraint Violation in InConstraintComponent (
>>>>>> http://www.w3.org/ns/shacl#InConstraintComponent):
>>>>>> Severity: sh:Violation
>>>>>> Source Shape: ex:ClassOrProperty
>>>>>> Focus Node: hr:Employee
>>>>>> Value Node: hr:Employee
>>>>>> Result Path: [ sh:zeroOrMorePath rdf:type ]
>>>>>>
>>>>>>
>>>>>> I was thinking that the zero in sh:zeroOrMorePath would see hr:Employee
>>>>>> a rdfs:Class ; and validate. Is it the case that the zero in sh:zeroOrMorePath
>>>>>> causes a validation engine to compare a node against itself without
>>>>>> following or looking for the path?
>>>>>>
>>>>>> I did try using sh:oneOrMorePath, but I received the validation
>>>>>> error (J) again, but (K) did not show up. Is the reason why (K) did not
>>>>>> show up because it was forced to see hr:Employee a rdfs:Class ; because
>>>>>> of the one in sh:oneOrMorePath and could validate it?
>>>>>>
>>>>>> Perhaps a validation engine validates every node along the path and
>>>>>> not just where the path terminates? If this is the case, is it possible to
>>>>>> validate where the path terminates only?
>>>>>>
>>>>>> Needless to say, I am rather confused.
>>>>>>
>>>>>> Can anyone clear this up?
>>>>>>
>>>>>> Thank you,
>>>>>> James
>>>>>>
>>>>>>
>>>>>>
>>>>>>
>>>>>>
>>>>>
>>>>
>>>
>>

Received on Wednesday, 22 April 2020 18:29:48 UTC