Re: SHACL target extension from Håvard M. Ottestad on 2020-05-21 (public-shacl@w3.org from May 2020)

From: Håvard M. Ottestad <hmottestad@gmail.com>
Date: Thu, 21 May 2020 12:02:49 +0200
To: Holger Knublauch <holger@topquadrant.com>
Cc: public-shacl@w3.org
Message-Id: <E28E5736-B835-4320-B35C-321F47BE1CAD@gmail.com>
Hi Holger and everyone else :)

The targets for the TargetShape would be all subjects and all objects in the data graph.

The TargetShape would produce all subjects or objects in the data graph and considered valid according to the interpretation of the shape of the TargetShape.

As a simple rule, a Shape with a clone of itself as the TargetShape would end up validating only targets that are known to be valid and would consequently return no violations. 

Håvard

> On 21 May 2020, at 04:19, Holger Knublauch <holger@topquadrant.com> wrote:
> 
> 
> 
> 
> On 20/05/2020 22:23, Håvard Ottestad wrote:
>> Hi,
>> 
>> For the RDF4J SHACL implementation we would be able to much better optimise for something like filters than we ever could for SPARQL targets. Currently our benchmarks show that our custom targeting approach is considerably faster that SPARQL targets, milliseconds vs. seconds. This wouldn’t necessarily apply to other implementations though.
>> 
>> My idea about using filters as SHACL advanced targets would look something like this:
>> 
>> Shape explanation: Anyone who knows three or more people must also know Steve.
>> 
>> ex:EveryoneWhoKnowsThreePeopleMustKnowSteve
>>  a sh:Shape ;
>>  sh:target [
>>   a sh:TargetShape ; 
>>   sh:property [
>>    sh:path foaf:knows;
>>    sh:minCount 3 ;
>>   ]
>>  ] ;
>>  sh:property [
>>   sh:path foaf:knows;
>>   sh:hasValue ex:Steve;
>>  ] .
>> 
>> Which would essentially have the same results as:
>> 
>> ex:EveryoneWhoKnowsThreePeopleMustKnowSteve
>>  a sh:Shape ;
>>  sh:targetSubjectsOf foaf:knows ;
>>  sh:or (
>>   [
>>    sh:path foaf:knows; 
>>    sh:minCount 3; 
>>    sh:hasValue ex:Steve;
>>   ]
>>   [
>>    sh:path foaf:knows; 
>>    sh:maxCount 2;
>>   ]
>>  ) .
>> 
>> Anyone think that this is a good (or maybe a particularly bad) idea?
> In general I agree that richer targets are needed. While there might not be an official WG to produce such a thing, we as implementers could establish a de-facto standard. I had designed sh:target to serve as an extension point here, allowing custom systems to plug in their own extensions. The use a single property (sh:target) at least indicates to a processor that *some* target exists, so that it can at least print a warning if it doesn't know what to do with it.
> 
> Down the road, if we agree on something as fundamental as something similar to filterShapes then we could introduce a new keyword such as sh:targetNodesConforming which would take a shape declaration as its value.
> 
> My specific question (and I may be blind right now) is: what would be the target nodes of the TargetShape in your example? Formally it would need to be the set of all nodes in the universe, which doesn't even exist. Without target nodes, most constraints cannot be interpreted because they are formulated with a given focus node in mind.
> 
> That's why reopening sh:filterShape might be a better approach. It has the advantage that filters can be added to any shape including shapes imported from a 3rd party, to narrow its targets down for a specific application. I don't remember exactly why we dropped that. Dimitris is correct that it was due to lack of time - there was quite some panic at the end of the WG. The reason was probably the complexity due to recursion. The minutes SHOULD have a resolution which may explain more.
> 
> Holger
> 
> 
> 
>> 
>> Håvard
>> 
>> 
>> 
>>> On 20 May 2020, at 12:48, Varytimou, Natasa (Refinitiv) <Natasa.Varytimou@refinitiv.com> wrote:
>>> 
>>> Hi all
>>> 
>>> We also had a big performance issue with SHACL Sparql Targets which are incredible useful.
>>> Is there anything that can be done to improve performance?
>>> And the same question for Filters ( which I support that are useful to be included), will we have performance issues there as well?
>>> 
>>> 
>>> -----Original Message-----
>>> From: Håvard Ottestad <hmottestad@gmail.com> 
>>> Sent: 20 May 2020 11:25
>>> To: Andy Seaborne <andy@apache.org>
>>> Cc: public-shacl@w3.org
>>> Subject: Re: SHACL target extension
>>> 
>>> Hi Andy and Dimitris
>>> 
>>> Filters look like particularly useful constructs. They also look very powerful, which is both good and bad.
>>> 
>>> It’s quite close to what I want. I would want to have the filter run on all nodes in the data graph, essentially a sh:targetAllSubjects target. I think I saw something along those lines already, but I couldn’t find it now while writing this email.
>>> 
>>> I can see that a natural extension would be to allow filters to be used as targets themselves, maybe through the SHACL Advanced sh:target property.
>>> 
>>> Håvard
>>> 
>>>> On 20 May 2020, at 10:29, Andy Seaborne <andy@apache.org> wrote:
>>>> 
>>>> Nice!
>>>> 
>>>> That would be a useful addition to SHACL both on targets and on property shapes. And for rules.
>>>> 
>>>> Were there any other features that got dropped that the community might be interested in?
>>>> 
>>>>   Andy
>>>> 
>>>> 
>>>>>> On 19/05/2020 22:29, Dimitris Kontokostas wrote:
>>>>> Hi Håvard,
>>>>> I think what you are after is something like the filter shape feature 
>>>>> (https://nam12.safelinks.protection.outlook.com/?url=https%3A%2F%2Fww
>>>>> w.w3.org%2FTR%2F2016%2FWD-shacl-201608H%25C3%25A5vard14%2F%23filterSh
>>>>> ape&amp;data=02%7C01%7CNatasa.Varytimou%40refinitiv.com%7Ce6dc14407b7
>>>>> 34d54b27908d7fca8327d%7C71ad2f6261e244fc9e8586c2827f6de9%7C0%7C0%7C63
>>>>> 7255671647244813&amp;sdata=9fkMStQVgoP4f8k3OKo4gaq4uampFgxMYbXuPjSH4q
>>>>> A%3D&amp;reserved=0 
>>>>> <https://nam12.safelinks.protection.outlook.com/?url=https%3A%2F%2Fww
>>>>> w.w3.org%2FTR%2F2016%2FWD-shacl-20160814%2F%23filterShape&amp;data=02
>>>>> %7C01%7CNatasa.Varytimou%40refinitiv.com%7Ce6dc14407b734d54b27908d7fc
>>>>> a8327d%7C71ad2f6261e244fc9e8586c2827f6de9%7C0%7C0%7C63725567164724481
>>>>> 3&amp;sdata=dVeAUe7hfKbnliGxy5KVlNTE5Zs%2BE3f81z2GX%2BYFQfc%3D&amp;re
>>>>> served=0>) This is something that existed in the first versions of 
>>>>> SHACL but was dropped due to time restrictions near the end of the WG 
>>>>> Best, Dimitris
>>>>>> On Tue, May 19, 2020 at 11:05 PM Håvard Ottestad <hmottestad@gmail.com <mailto:hmottestad@gmail.com>> wrote:
>>>>>  Hi James and Irene,
>>>>>  Thanks for the replies.
>>>>>  This is more a question of the standardisation aspect. Did anyone
>>>>>  discus including more elaborate target building blocks? There is
>>>>>  already sh:targetClass for rdf:type, but did anyone consider other
>>>>>  class constructs like skos:inScheme?
>>>>>  We already have two functional solutions within the current syntax:
>>>>>   - use sh:targetNode with sh:inverseProperty
>>>>>   - use SPARQL targets
>>>>>  The issue with these solutions are:
>>>>>  1. Using sh:targetNode and sh:inverseProperty are much harder to
>>>>>  read than something like the compound target that we we
>>>>>  considering introducing.
>>>>>  2. SPARQL targets take took long to evaluate for transactional
>>>>>  workloads.
>>>>>  Håvard
>>>>>>  On 19 May 2020, at 20:37, James Hudson <jameshudson3010@gmail.com
>>>>>>  <mailto:jameshudson3010@gmail.com>> wrote:
>>>>>>  You may want to check out:
>>>>>>  https://nam12.safelinks.protection.outlook.com/?url=https%3A%2F%2Fstackoverflow.com%2Fquestions%2F61323857%2Fwhat-is-the-difference-between-these-shape-graphs-which-use-shor&amp;data=02%7C01%7CNatasa.Varytimou%40refinitiv.com%7Ce6dc14407b734d54b27908d7fca8327d%7C71ad2f6261e244fc9e8586c2827f6de9%7C0%7C0%7C637255671647244813&amp;sdata=kjaTfSE9gI524M8kypS5LzuVtajKVemL7vMOWxlHfEw%3D&amp;reserved=0
>>>>>>  and
>>>>>>  https://nam12.safelinks.protection.outlook.com/?url=https%3A%2F%2Fstackoverflow.com%2Fquestions%2F61190422%2Fvalidating-that-every-subject-has-a-type-of-class&amp;data=02%7C01%7CNatasa.Varytimou%40refinitiv.com%7Ce6dc14407b734d54b27908d7fca8327d%7C71ad2f6261e244fc9e8586c2827f6de9%7C0%7C0%7C637255671647244813&amp;sdata=skAZd%2BVgGSCgD%2B5wEuPAuRV7XFUEcOlBJ78Ol0oWcGs%3D&amp;reserved=0
>>>>>>  and other SHACL questions and answers I have on SO. They may help
>>>>>>  you out.
>>>>>>  As Irene already pointed out, SPARQL-based targets will solve
>>>>>>  your problem.
>>>>>>  On Tue, May 19, 2020 at 11:39 AM Håvard Ottestad
>>>>>>  <hmottestad@gmail.com <mailto:hmottestad@gmail.com>> wrote:
>>>>>>      Hi,
>>>>>>      I’m the developer for the RDF4J SHACL implementation and we
>>>>>>      are looking into extending the targeting options in SHACL and
>>>>>>      are wondering if this is something that was discussed during
>>>>>>      the development of the standard or if anyone else has run
>>>>>>      into similar requirements.
>>>>>>      Essentially extending the current list of sh:targetNode,
>>>>>>      sh:targetClass, sh:targetSubjectsOf and sh:targetObjectsOf.
>>>>>>      Our use case can be summed up as.
>>>>>>      ex:Håvard ex:nationality ex:Norway;
>>>>>>          ex:norwegianID “12345612345”.
>>>>>>      Where we would essentially like to be able to add a shape
>>>>>>      that says that all Norwegian citizens should have a Norwegian
>>>>>>      ID number.
>>>>>>      We have been testing out the concept of a compound target.
>>>>>>      For our current tests we have used our own namespace like this:
>>>>>>      @prefix rdf4j-sh: <https://nam12.safelinks.protection.outlook.com/?url=http%3A%2F%2Frdf4j.org%2Fschema%2Frdf4j-shacl%23&amp;data=02%7C01%7CNatasa.Varytimou%40refinitiv.com%7Ce6dc14407b734d54b27908d7fca8327d%7C71ad2f6261e244fc9e8586c2827f6de9%7C0%7C0%7C637255671647244813&amp;sdata=ZmKYvssWhaW30oRKEkRqDpK6%2FizYr8tDe8xaPfdqvPc%3D&amp;reserved=0> .
>>>>>>      ex:PersonShape
>>>>>>             a sh:NodeShape  ;
>>>>>>             rdf4j-sh:compoundTarget [
>>>>>>                     rdf4j-sh:targetPredicate ex:nationality;
>>>>>>                     rdf4j-sh:targetObject ex:Norway
>>>>>>             ];
>>>>>>             sh:property [
>>>>>>                    sh:path ex:norwegianID ;
>>>>>>                    sh:minCount 1 ;
>>>>>>                    sh:maxCount 1 ;
>>>>>>             ] .
>>>>>>      We have also been thinking about allowing
>>>>>>      rdf4j-sh:targetObject to be have multiple values.
>>>>>>      I also realise that it’s possible to use inversePath to solve
>>>>>>      this same problem, but I feel it becomes hard to read and
>>>>>>      grasp the intent.
>>>>>>      ex:PersonShape
>>>>>>             a sh:NodeShape  ;
>>>>>>             sh:targetNode ex:Norway;
>>>>>>             sh:property [
>>>>>>                    sh:path [sh:inversePath ex:nationality ];
>>>>>>                    sh:property [
>>>>>>                      sh:path ex:norwegianID ;
>>>>>>                      sh:minCount 1 ;
>>>>>>                      sh:maxCount 1 ;
>>>>>>                    ]
>>>>>>             ] .
>>>>>>      Concurrently we have been testing the SHACL Advanced SPARQL
>>>>>>      targets. These allow us to do the same thing, but we are
>>>>>>      unable to achieve the same level of performance. In one of
>>>>>>      our benchmarks we see that SPARQL targets is 450x slower per
>>>>>>      transaction than compound targets. This is mostly due to our
>>>>>>      SHACL implementation being able to analyse the transactional
>>>>>>      changes and run a very minimal validation for compound
>>>>>>      targets. We do think that SPARQL targets could be
>>>>>>      considerably faster, but the design choices that allow for
>>>>>>      minimal transactional validation are currently also limiting
>>>>>>      our options for speeding up SPARQL targets.
>>>>>>      Does anyone know if this approach to a more flexible
>>>>>>      targeting has been considered as part of the spec? Or if
>>>>>>      someone has run into similar needs and is maybe considering
>>>>>>      implementing something similar.
>>>>>>      Cheers,
>>>>>>      Håvard
>>>>> --
>>>>> Kontokostas Dimitris
>>> 
>>> 
>>
Received on Thursday, 21 May 2020 10:03:50 UTC