Fwd: [TopQuadrant/shacl] How to properly use RDFS inference with `sh:closed`? (#101) from Thomas Francart on 2020-10-05 (public-rdf-shapes@w3.org from October 2020)

From: Thomas Francart <thomas.francart@sparna.fr>
Date: Mon, 5 Oct 2020 09:37:55 +0200
To: public-rdf-shapes@w3.org
Message-ID: <CAPugn7WMG5Zomq+Wuyi0SvZJrhAgxCpJXfgquKRfs0YZVwUt6A@mail.gmail.com>
---------- Forwarded message ---------
De : Thomas Francart <thomas.francart@sparna.fr>
Date: lun. 5 oct. 2020 à 09:36
Subject: Re: [TopQuadrant/shacl] How to properly use RDFS inference with
`sh:closed`? (#101)
To: TopQuadrant/shacl <
reply+AAU2H4OG3DKVSVSOGIT3BYF5QHKGVEVBNHHCU6KVPA@reply.github.com>
Cc: TopQuadrant/shacl <shacl@noreply.github.com>, Subscribed <
subscribed@noreply.github.com>


Hello

Le jeu. 1 oct. 2020 à 17:19, Irene Polikoff <notifications@github.com> a
écrit :

> First, on the terminology. In my experience, “closed world assumption”
> (CWA) refers to the following two items:
>
> 1. Negation as a failure e.g., if we do not have a value for let’s say
> last name of a Person, we assume that this data does not exist. Thus, if we
> say that sh:minCount for lastName is 1, we get a violation. With
> owl:minCardinality 1 restriction, we would not get a violation.
> 2. Unique names e.g., we assume that resources with different URIs are
> different resources
>
> Both of these things are already in SHACL. In other words, SHACL is based
> on the CWA.
>
> Sh:closed is something else. It says that resources that are targeted by a
> shape, only have values for properties described in the shape.
>
> SHACL engines will always do a small bit of RDFS inferences using
> rdf:type/rdfs;SubClassOf - as described in the spec:
> https://www.w3.org/TR/shacl/#terminology <
> https://www.w3.org/TR/shacl/#terminology>. :instanceB is a SHACL instance
> of :ClassA and, therefore, a target of :ShapeA.
>
> If B is a subclass of A, it is not an extension of A, it is a subset of A
> - all instances of B are instances of A. Therefore, it is correct that all
> instances of B must be valid according to shapes that target instances of
> A. With this, a modeling approach that uses sh:closed while targeting
> members of a set that has subsets with additional properties you want to
> allow, seems peculiar to me.
>

I found myself in the exact same situation.  I feel the relationship
between sh:closed and that "little bit of inference that SHACL engine do"
is confusing.
I'd like to put my own words on this :

   1. My data graph contains direct instances of A and direct instances of
   B ("x rdf:type A" and "y rdf:type B");
   2. I need to check that the structure of the graph is "closed", that is
   the set of direct instances of A only have a set of allowed properties, and
   direct instances of B only have another set of allowed properties;
   3. If I define a Shape that target class A and a shape that target class
   B, I can close each Shape and it works fine
   4. If it happens that B is a subClass of A, then, as Jason described, it
   does not work anymore "as I would expect";


> There may be ways for accomplishing what you want if you insist on using
> sh:closed on the shapes that target classes with subclasses, but they are
> not straightforward.
>
> 1. You can use sh:ignoredProperties in :ShapeA and list :propB there.
> However, it will then decide that any instance of A is valid if it has
> :propB, even if it is not an instance of B. And, of course, you would need
> to do it for all properties that are allowed for instances of subclasses
> 2. You can use sh:or in defining :ShapeA to say that instances of A
> (target class) either conform to :ShapeB or to whatever you currently have
> defined for :ShapeA. Of course, this would require you to treat all
> subclasses this way and it becomes complex, especially since you already
> have the complexity of separating node shapes and classes. It will be more
> straightforward (but still some maintenance issue as you add new
> subclasses) if you use implicit class targets
> https://www.w3.org/TR/shacl/#implicit-targetClass <
> https://www.w3.org/TR/shacl/#implicit-targetClass>
> 3. Possibly, sh:node could be of use. It is an alternative to using target
> statements, but this is specific to specifying what is valid as a value of
> a specific property. See
> https://www.w3.org/TR/shacl/#NodeConstraintComponent <
> https://www.w3.org/TR/shacl/#NodeConstraintComponent>. With sh:node, it
> does not matter if a value of a property is a member of multiple classes,
> it will only apply the identified shape.
>
> May be some other variations.
>

Certainly. The point is that one cannot have both 1/ sh:closed on shapes
that target classes with subclasses and 2/ that "little bit of RDFS
inference". So I had some other ideas :

   1. don't use sh:targetClass as it triggers this RDFS inference; instead
   use a SPARQL Target that will select only direct instances of the classes
   (but SPARQL target is part of SHACL advance features)
   2. don't provide the ontology layer with subClassOf relationships to the
   SHACL engine; I did this but for some reason I can't remember this was
   causing other issues;
   3. don't use sh:closed; instead, create 1 shape per property that will
   validate the domain of the property (iow, that will validate that each
   property is asserted on an instance of the correct type), using a
   combination of sh:targetSubjectOf and sh:class :

<http://the.property-convertedToShape>
        a                    sh:NodeShape ;
        sh:class             <http://the.domain.class.of.the.property> ;
        sh:targetSubjectsOf  <http://the.property> .

This is not "closed", but at least it allows to check that every property
in my knowledge domain is asserted on the correct class. It does not verify
if other properties are asserted as well.


Another thing I find confusing is the (absence of) relationship between
sh:closed and the use of property paths in sh:path. sh.closed only works
with simple properties asserted in sh:path. AFAIK this is not something
explicit in the spec. It does not take into account properties that are
part of a property path. So I find myself sometimes expressing similar
things twice : once with a simple property to be catched up by sh:closed,
once with a property path to express the constraint I need.
e.g. I need to verify the following :

   1. class :C can have property :p1 with instances of :D as value
   2. :inverse-p1 is the inverse property of :p1
   3. class :C needs to have at least one value for either :p1 or its
   inverse
   4. I want closed shapes;

I'd like to write :

ex:MyNodeShape a sh:NodeShape ;
  sh:targetClass :C ;
  ex:property [
    sh:path [ sh:alternativePath (:p1 [ sh:inversePath crm:inverse-p1 ]) ] ;
    sh:minCount 1 ;
    sh:class :D ;
  ]
  sh:closed true ;

But sh:closed does not understand that :p1 is "allowed" on :C, because it's
hidden in the path.
So I need to write :

ex:MyNodeShape a sh:NodeShape ;
  sh:targetClass :C ;
  sh:property [
    sh:path [ sh:alternativePath (:p1 [ sh:inversePath crm:inverse-p1 ]) ] ;
    sh:minCount 1 ;
  ] ;
  sh:property [
    sh:path :p1 ;
    sh:class :D ;
  ] ;
  sh:closed true ;

Best Regards
Thomas


>
> Hope this helps,
>
> Irene
>
> > On Oct 1, 2020, at 3:55 AM, Jason B. Koh <notifications@github.com>
> wrote:
> >
> >
> > Hi! I'm trying to use SHACL for a couple of my projects. I would like to
> understand the relationship between rdfs:subClassOf and sh:closed better.
> Basically, I would like to use the closed world assumption supported by
> sh:closed but allowing subclasses to specify more than their superclasses'
> shapes. For example,
> >
> > The schema graph:
> > :ClassA a rdfs:Class.
> >
> > :ClassB rdfs:subClassOf :ClassA.
> > The Shape graph:
> > :ShapeA a sh:NodeShape;
> > sh:targetClass :ClassA;
> > sh:property [sh:path :propA; sh:datatype xsd:string];
> > sh:closed true.
> >
> > :ShapeB a sh:NodeShape;
> > sh:targetClass :ClassB;
> > sh:property [sh:path :propB; sh:datatype xsd:string];
> > sh:property [sh:path :propA; sh:datatype xsd:string];
> > sh:closed true.
> > The Data graph:
> > :instanceA a :ClassA;
> > :propA "valueA".
> >
> > :instanceB a :ClassB;
> > :propA "valueA";
> > :propB "valueB".
> > In the RDFS logic, we can infer the following triple
> >
> > :instanceB a :ClassA.
> > So now, instanceB is violating ShapeA's closed assumption.
> >
> > I like both 1) RDFS subclass hierarchy across the concepts I would like
> to model and 2) sh:closed property to easily verify the entire data set. So
> my desired outcome would be, superclasses's shapes' sh:closed property
> would be ignored in SHACL validation. I feel it's a natural modeling
> practice, like, if this is only an instance of ClassA, it can only have
> propA. If that is an instance of ClassB which is a subclass or an extension
> of ClassA, it can have both propA and propB.
> >
> > Would there be a solution for this use case?
> >
> > Thanks a lot!
> >
> > —
> > You are receiving this because you are subscribed to this thread.
> > Reply to this email directly, view it on GitHub <
> https://github.com/TopQuadrant/shacl/issues/101>, or unsubscribe <
> https://github.com/notifications/unsubscribe-auth/AAG762C4G4ICUKM3KCND5ELSIQYXXANCNFSM4SACLMDQ
> >.
> >
>
> —
> You are receiving this because you are subscribed to this thread.
> Reply to this email directly, view it on GitHub
> <https://github.com/TopQuadrant/shacl/issues/101#issuecomment-702208345>,
> or unsubscribe
> <https://github.com/notifications/unsubscribe-auth/AAU2H4JPSXRPN3JBQUZPLF3SISMWVANCNFSM4SACLMDQ>
> .
>


-- 

*Thomas Francart* -* SPARNA*
Web de *données* | Architecture de l'*information* | Accès aux
*connaissances*
blog : blog.sparna.fr, site : sparna.fr, linkedin :
fr.linkedin.com/in/thomasfrancart
tel :  +33 (0)6.71.11.25.97, skype : francartthomas


-- 

*Thomas Francart* -* SPARNA*
Web de *données* | Architecture de l'*information* | Accès aux
*connaissances*
blog : blog.sparna.fr, site : sparna.fr, linkedin :
fr.linkedin.com/in/thomasfrancart
tel :  +33 (0)6.71.11.25.97, skype : francartthomas
Received on Monday, 5 October 2020 07:38:20 UTC