Re: SHACL target extension from James Hudson on 2020-05-22 (public-shacl@w3.org from May 2020)

From: James Hudson <jameshudson3010@gmail.com>
Date: Fri, 22 May 2020 08:57:21 -0400
To: Holger Knublauch <holger@topquadrant.com>
Cc: Håvard M. Ottestad <hmottestad@gmail.com>, Public Shacl W3C <public-shacl@w3.org>
Message-ID: <CAEUVO9FzV_HWMAacXjzdWwXnYE8PKTRBixxB+Y7LNbi7qFkAtg@mail.gmail.com>
Hello Holger,

I'll add my opinion that sh:AllSubjects, etc. would be useful additions.

I am currently using

"sh:select" = "SELECT ?this WHERE { ?this ?p ?o . }"


which is equivalent to an sh:AllSubjects.

I am using it to help me validate schema's, making sure that all subjects
have a rdfs:comment, rdfs:label, etc.

While sh:AllSubjects is not necessary because it can be trivially expressed
with a SPARQL query, it would be useful because optimized implementations
would be easier to provide when sh:AllSubjects is needed.

Regards,
James



On Fri, May 22, 2020 at 1:05 AM Holger Knublauch <holger@topquadrant.com>
wrote:

>
> On 21/05/2020 20:02, "Håvard M. Ottestad" wrote:
>
> Hi Holger and everyone else :)
>
> The targets for the TargetShape would be all subjects and all objects in
> the data graph.
>
> In earlier versions of SHACL we had something like sh:target
> sh:AllSubjects and sh:target sh:AllObjects. I have forgotten the details
> but they were taken out soon afterwards and I moved them into the dash: (
> datashapes.org) namespace while sh:target became a SHACL-AF term.
>
> So I guess we could reintroduce something like those and also
> sh:filterShape and would possibly have the most flexible solution? So
>
>     target nodes = targets of sh:targetXY filtered by sh:filterShape
>
> A clever algorithm then has a declarative model to work with and may
> quickly detect patterns like
>
> ex:EveryoneWhoKnowsThreePeopleMustKnowSteve
>  a sh:NodeShape ;
>  sh:target sh:AllSubject, sh:AllObjects ;
>  sh:filterShape [
>   sh:property [
>    sh:path foaf:knows;
>    sh:minCount 3 ;
>   ]
>  ] ;
>  sh:property [
>   sh:path foaf:knows;
>   sh:hasValue ex:Steve;
>  ] .
>
> Separating sh:AllSubjects and sh:AllObjects separately would offer more
> flexibility too. (The case above can only be satisfied by subjects, so why
> even bother about the objects?)
>
> Holger
>
>
>
> The TargetShape would produce all subjects or objects in the data graph
> and considered valid according to the interpretation of the shape of the
> TargetShape.
>
> As a simple rule, a Shape with a clone of itself as the TargetShape would
> end up validating only targets that are known to be valid and would
> consequently return no violations.
>
> Håvard
>
> On 21 May 2020, at 04:19, Holger Knublauch <holger@topquadrant.com>
> <holger@topquadrant.com> wrote:
>
> 
>
>
> On 20/05/2020 22:23, Håvard Ottestad wrote:
>
> Hi,
>
> For the RDF4J SHACL implementation we would be able to much better
> optimise for something like filters than we ever could for SPARQL targets.
> Currently our benchmarks show that our custom targeting approach is
> considerably faster that SPARQL targets, milliseconds vs. seconds. This
> wouldn’t necessarily apply to other implementations though.
>
> My idea about using filters as SHACL advanced targets would look something
> like this:
>
> Shape explanation: Anyone who knows three or more people must also know
> Steve.
>
> ex:EveryoneWhoKnowsThreePeopleMustKnowSteve
>  a sh:Shape ;
>  sh:target [
>   a sh:TargetShape ;
>   sh:property [
>    sh:path foaf:knows;
>    sh:minCount 3 ;
>   ]
>  ] ;
>  sh:property [
>   sh:path foaf:knows;
>   sh:hasValue ex:Steve;
>  ] .
>
>
> Which would essentially have the same results as:
>
> ex:EveryoneWhoKnowsThreePeopleMustKnowSteve
>  a sh:Shape ;
>  sh:targetSubjectsOf foaf:knows ;
>  sh:or (
>   [
>    sh:path foaf:knows;
>    sh:minCount 3;
>    sh:hasValue ex:Steve;
>   ]
>   [
>    sh:path foaf:knows;
>    sh:maxCount 2;
>   ]
>  ) .
>
>
> Anyone think that this is a good (or maybe a particularly bad) idea?
>
> In general I agree that richer targets are needed. While there might not
> be an official WG to produce such a thing, we as implementers could
> establish a de-facto standard. I had designed sh:target to serve as an
> extension point here, allowing custom systems to plug in their own
> extensions. The use a single property (sh:target) at least indicates to a
> processor that *some* target exists, so that it can at least print a
> warning if it doesn't know what to do with it.
>
> Down the road, if we agree on something as fundamental as something
> similar to filterShapes then we could introduce a new keyword such as
> sh:targetNodesConforming which would take a shape declaration as its value.
>
> My specific question (and I may be blind right now) is: what would be the
> target nodes of the TargetShape in your example? Formally it would need to
> be the set of all nodes in the universe, which doesn't even exist. Without
> target nodes, most constraints cannot be interpreted because they are
> formulated with a given focus node in mind.
>
> That's why reopening sh:filterShape might be a better approach. It has the
> advantage that filters can be added to any shape including shapes imported
> from a 3rd party, to narrow its targets down for a specific application. I
> don't remember exactly why we dropped that. Dimitris is correct that it was
> due to lack of time - there was quite some panic at the end of the WG. The
> reason was probably the complexity due to recursion. The minutes SHOULD
> have a resolution which may explain more.
>
> Holger
>
>
>
> Håvard
>
>
>
> On 20 May 2020, at 12:48, Varytimou, Natasa (Refinitiv) <
> Natasa.Varytimou@refinitiv.com> wrote:
>
> Hi all
>
> We also had a big performance issue with SHACL Sparql Targets which are
> incredible useful.
> Is there anything that can be done to improve performance?
> And the same question for Filters ( which I support that are useful to be
> included), will we have performance issues there as well?
>
>
> -----Original Message-----
> From: Håvard Ottestad <hmottestad@gmail.com>
> Sent: 20 May 2020 11:25
> To: Andy Seaborne <andy@apache.org>
> Cc: public-shacl@w3.org
> Subject: Re: SHACL target extension
>
> Hi Andy and Dimitris
>
> Filters look like particularly useful constructs. They also look very
> powerful, which is both good and bad.
>
> It’s quite close to what I want. I would want to have the filter run on
> all nodes in the data graph, essentially a sh:targetAllSubjects target. I
> think I saw something along those lines already, but I couldn’t find it now
> while writing this email.
>
> I can see that a natural extension would be to allow filters to be used as
> targets themselves, maybe through the SHACL Advanced sh:target property.
>
> Håvard
>
> On 20 May 2020, at 10:29, Andy Seaborne <andy@apache.org> wrote:
>
> Nice!
>
> That would be a useful addition to SHACL both on targets and on property
> shapes. And for rules.
>
> Were there any other features that got dropped that the community might be
> interested in?
>
>   Andy
>
>
> On 19/05/2020 22:29, Dimitris Kontokostas wrote:
>
> Hi Håvard,
> I think what you are after is something like the filter shape feature
> (https://nam12.safelinks.protection.outlook.com/?url=https%3A%2F%2Fww
> w.w3.org%2FTR%2F2016%2FWD-shacl-201608H%25C3%25A5vard14%2F%23filterSh
> ape&amp;data=02%7C01%7CNatasa.Varytimou%40refinitiv.com%7Ce6dc14407b7
> 34d54b27908d7fca8327d%7C71ad2f6261e244fc9e8586c2827f6de9%7C0%7C0%7C63
> 7255671647244813&amp;sdata=9fkMStQVgoP4f8k3OKo4gaq4uampFgxMYbXuPjSH4q
> A%3D&amp;reserved=0
> <https://nam12.safelinks.protection.outlook.com/?url=https%3A%2F%2Fww
> w.w3.org%2FTR%2F2016%2FWD-shacl-20160814%2F%23filterShape&amp;data=02
> %7C01%7CNatasa.Varytimou%40refinitiv.com%7Ce6dc14407b734d54b27908d7fc
> a8327d%7C71ad2f6261e244fc9e8586c2827f6de9%7C0%7C0%7C63725567164724481
> 3&amp;sdata=dVeAUe7hfKbnliGxy5KVlNTE5Zs%2BE3f81z2GX%2BYFQfc%3D&amp;re
> served=0>) This is something that existed in the first versions of
> SHACL but was dropped due to time restrictions near the end of the WG
> Best, Dimitris
>
> On Tue, May 19, 2020 at 11:05 PM Håvard Ottestad <hmottestad@gmail.com <
> mailto:hmottestad@gmail.com <hmottestad@gmail.com>>> wrote:
>
>  Hi James and Irene,
>  Thanks for the replies.
>  This is more a question of the standardisation aspect. Did anyone
>  discus including more elaborate target building blocks? There is
>  already sh:targetClass for rdf:type, but did anyone consider other
>  class constructs like skos:inScheme?
>  We already have two functional solutions within the current syntax:
>   - use sh:targetNode with sh:inverseProperty
>   - use SPARQL targets
>  The issue with these solutions are:
>  1. Using sh:targetNode and sh:inverseProperty are much harder to
>  read than something like the compound target that we we
>  considering introducing.
>  2. SPARQL targets take took long to evaluate for transactional
>  workloads.
>  Håvard
>
>  On 19 May 2020, at 20:37, James Hudson <jameshudson3010@gmail.com
>  <mailto:jameshudson3010@gmail.com <jameshudson3010@gmail.com>>> wrote:
>  You may want to check out:
>
> https://nam12.safelinks.protection.outlook.com/?url=https%3A%2F%2Fstackoverflow.com%2Fquestions%2F61323857%2Fwhat-is-the-difference-between-these-shape-graphs-which-use-shor&amp;data=02%7C01%7CNatasa.Varytimou%40refinitiv.com%7Ce6dc14407b734d54b27908d7fca8327d%7C71ad2f6261e244fc9e8586c2827f6de9%7C0%7C0%7C637255671647244813&amp;sdata=kjaTfSE9gI524M8kypS5LzuVtajKVemL7vMOWxlHfEw%3D&amp;reserved=0
>  and
>
> https://nam12.safelinks.protection.outlook.com/?url=https%3A%2F%2Fstackoverflow.com%2Fquestions%2F61190422%2Fvalidating-that-every-subject-has-a-type-of-class&amp;data=02%7C01%7CNatasa.Varytimou%40refinitiv.com%7Ce6dc14407b734d54b27908d7fca8327d%7C71ad2f6261e244fc9e8586c2827f6de9%7C0%7C0%7C637255671647244813&amp;sdata=skAZd%2BVgGSCgD%2B5wEuPAuRV7XFUEcOlBJ78Ol0oWcGs%3D&amp;reserved=0
>  and other SHACL questions and answers I have on SO. They may help
>  you out.
>  As Irene already pointed out, SPARQL-based targets will solve
>  your problem.
>  On Tue, May 19, 2020 at 11:39 AM Håvard Ottestad
>  <hmottestad@gmail.com <mailto:hmottestad@gmail.com <hmottestad@gmail.com>>>
> wrote:
>      Hi,
>      I’m the developer for the RDF4J SHACL implementation and we
>      are looking into extending the targeting options in SHACL and
>      are wondering if this is something that was discussed during
>      the development of the standard or if anyone else has run
>      into similar requirements.
>      Essentially extending the current list of sh:targetNode,
>      sh:targetClass, sh:targetSubjectsOf and sh:targetObjectsOf.
>      Our use case can be summed up as.
>      ex:Håvard ex:nationality ex:Norway;
>          ex:norwegianID “12345612345”.
>      Where we would essentially like to be able to add a shape
>      that says that all Norwegian citizens should have a Norwegian
>      ID number.
>      We have been testing out the concept of a compound target.
>      For our current tests we have used our own namespace like this:
>      @prefix rdf4j-sh: <
> https://nam12.safelinks.protection.outlook.com/?url=http%3A%2F%2Frdf4j.org%2Fschema%2Frdf4j-shacl%23&amp;data=02%7C01%7CNatasa.Varytimou%40refinitiv.com%7Ce6dc14407b734d54b27908d7fca8327d%7C71ad2f6261e244fc9e8586c2827f6de9%7C0%7C0%7C637255671647244813&amp;sdata=ZmKYvssWhaW30oRKEkRqDpK6%2FizYr8tDe8xaPfdqvPc%3D&amp;reserved=0>
> .
>      ex:PersonShape
>             a sh:NodeShape  ;
>             rdf4j-sh:compoundTarget [
>                     rdf4j-sh:targetPredicate ex:nationality;
>                     rdf4j-sh:targetObject ex:Norway
>             ];
>             sh:property [
>                    sh:path ex:norwegianID ;
>                    sh:minCount 1 ;
>                    sh:maxCount 1 ;
>             ] .
>      We have also been thinking about allowing
>      rdf4j-sh:targetObject to be have multiple values.
>      I also realise that it’s possible to use inversePath to solve
>      this same problem, but I feel it becomes hard to read and
>      grasp the intent.
>      ex:PersonShape
>             a sh:NodeShape  ;
>             sh:targetNode ex:Norway;
>             sh:property [
>                    sh:path [sh:inversePath ex:nationality ];
>                    sh:property [
>                      sh:path ex:norwegianID ;
>                      sh:minCount 1 ;
>                      sh:maxCount 1 ;
>                    ]
>             ] .
>      Concurrently we have been testing the SHACL Advanced SPARQL
>      targets. These allow us to do the same thing, but we are
>      unable to achieve the same level of performance. In one of
>      our benchmarks we see that SPARQL targets is 450x slower per
>      transaction than compound targets. This is mostly due to our
>      SHACL implementation being able to analyse the transactional
>      changes and run a very minimal validation for compound
>      targets. We do think that SPARQL targets could be
>      considerably faster, but the design choices that allow for
>      minimal transactional validation are currently also limiting
>      our options for speeding up SPARQL targets.
>      Does anyone know if this approach to a more flexible
>      targeting has been considered as part of the spec? Or if
>      someone has run into similar needs and is maybe considering
>      implementing something similar.
>      Cheers,
>      Håvard
>
> --
> Kontokostas Dimitris
>
>
>
>
>
Received on Friday, 22 May 2020 12:57:47 UTC