Re: SHACL target extension from Holger Knublauch on 2020-05-22 (public-shacl@w3.org from May 2020)

From: Holger Knublauch <holger@topquadrant.com>
Date: Fri, 22 May 2020 15:04:54 +1000
To: Håvard M. Ottestad <hmottestad@gmail.com>
Cc: public-shacl@w3.org
Message-ID: <fc1b0bd4-3b2b-7831-3f92-345092ce083b@topquadrant.com>
On 21/05/2020 20:02, "Håvard M. Ottestad" wrote:
> Hi Holger and everyone else :)
>
> The targets for the TargetShape would be all subjects and all objects 
> in the data graph.

In earlier versions of SHACL we had something like sh:target 
sh:AllSubjects and sh:target sh:AllObjects. I have forgotten the details 
but they were taken out soon afterwards and I moved them into the dash: 
(datashapes.org) namespace while sh:target became a SHACL-AF term.

So I guess we could reintroduce something like those and also 
sh:filterShape and would possibly have the most flexible solution? So

     target nodes = targets of sh:targetXY filtered by sh:filterShape

A clever algorithm then has a declarative model to work with and may 
quickly detect patterns like

|ex:EveryoneWhoKnowsThreePeopleMustKnowSteve a sh:NodeShape ; sh:target 
sh:AllSubject, sh:AllObjects ; sh:filterShape [ sh:property [ sh:path 
foaf:knows; sh:minCount 3 ; ] ] ; sh:property [ sh:path foaf:knows; 
sh:hasValue ex:Steve; ] .|

Separating sh:AllSubjects and sh:AllObjects separately would offer more 
flexibility too. (The case above can only be satisfied by subjects, so 
why even bother about the objects?)

Holger


>
> The TargetShape would produce all subjects or objects in the data 
> graph and considered valid according to the interpretation of the 
> shape of the TargetShape.
>
> As a simple rule, a Shape with a clone of itself as the TargetShape 
> would end up validating only targets that are known to be valid and 
> would consequently return no violations.
>
> Håvard
>
>> On 21 May 2020, at 04:19, Holger Knublauch <holger@topquadrant.com> 
>> wrote:
>>
>> 
>>
>>
>> On 20/05/2020 22:23, Håvard Ottestad wrote:
>>> Hi,
>>>
>>> For the RDF4J SHACL implementation we would be able to much better 
>>> optimise for something like filters than we ever could for SPARQL 
>>> targets. Currently our benchmarks show that our custom targeting 
>>> approach is considerably faster that SPARQL targets, milliseconds 
>>> vs. seconds. This wouldn’t necessarily apply to other 
>>> implementations though.
>>>
>>> My idea about using filters as SHACL advanced targets would look 
>>> something like this:
>>>
>>> Shape explanation: Anyone who knows three or more people must also 
>>> know Steve.
>>>
>>> |ex:EveryoneWhoKnowsThreePeopleMustKnowSteve a sh:Shape ; sh:target 
>>> [ a sh:TargetShape ; sh:property [ sh:path foaf:knows; sh:minCount 3 
>>> ; ] ] ; sh:property [ sh:path foaf:knows; sh:hasValue ex:Steve; ] .|
>>>
>>> Which would essentially have the same results as:
>>>
>>> |ex:EveryoneWhoKnowsThreePeopleMustKnowSteve a sh:Shape ; 
>>> sh:targetSubjectsOf foaf:knows ; sh:or ( [ sh:path foaf:knows; 
>>> sh:minCount 3; sh:hasValue ex:Steve; ] [ sh:path foaf:knows; 
>>> sh:maxCount 2; ] ) .|
>>>
>>> Anyone think that this is a good (or maybe a particularly bad) idea?
>>
>> In general I agree that richer targets are needed. While there might 
>> not be an official WG to produce such a thing, we as implementers 
>> could establish a de-facto standard. I had designed sh:target to 
>> serve as an extension point here, allowing custom systems to plug in 
>> their own extensions. The use a single property (sh:target) at least 
>> indicates to a processor that *some* target exists, so that it can at 
>> least print a warning if it doesn't know what to do with it.
>>
>> Down the road, if we agree on something as fundamental as something 
>> similar to filterShapes then we could introduce a new keyword such as 
>> sh:targetNodesConforming which would take a shape declaration as its 
>> value.
>>
>> My specific question (and I may be blind right now) is: what would be 
>> the target nodes of the TargetShape in your example? Formally it 
>> would need to be the set of all nodes in the universe, which doesn't 
>> even exist. Without target nodes, most constraints cannot be 
>> interpreted because they are formulated with a given focus node in mind.
>>
>> That's why reopening sh:filterShape might be a better approach. It 
>> has the advantage that filters can be added to any shape including 
>> shapes imported from a 3rd party, to narrow its targets down for a 
>> specific application. I don't remember exactly why we dropped that. 
>> Dimitris is correct that it was due to lack of time - there was quite 
>> some panic at the end of the WG. The reason was probably the 
>> complexity due to recursion. The minutes SHOULD have a resolution 
>> which may explain more.
>>
>> Holger
>>
>>
>>>
>>> Håvard
>>>
>>>
>>>
>>>> On 20 May 2020, at 12:48, Varytimou, Natasa (Refinitiv) 
>>>> <Natasa.Varytimou@refinitiv.com 
>>>> <mailto:Natasa.Varytimou@refinitiv.com>> wrote:
>>>>
>>>> Hi all
>>>>
>>>> We also had a big performance issue with SHACL Sparql Targets which 
>>>> are incredible useful.
>>>> Is there anything that can be done to improve performance?
>>>> And the same question for Filters ( which I support that are useful 
>>>> to be included), will we have performance issues there as well?
>>>>
>>>>
>>>> -----Original Message-----
>>>> From: Håvard Ottestad <hmottestad@gmail.com 
>>>> <mailto:hmottestad@gmail.com>>
>>>> Sent: 20 May 2020 11:25
>>>> To: Andy Seaborne <andy@apache.org <mailto:andy@apache.org>>
>>>> Cc: public-shacl@w3.org <mailto:public-shacl@w3.org>
>>>> Subject: Re: SHACL target extension
>>>>
>>>> Hi Andy and Dimitris
>>>>
>>>> Filters look like particularly useful constructs. They also look 
>>>> very powerful, which is both good and bad.
>>>>
>>>> It’s quite close to what I want. I would want to have the filter 
>>>> run on all nodes in the data graph, essentially a 
>>>> sh:targetAllSubjects target. I think I saw something along those 
>>>> lines already, but I couldn’t find it now while writing this email.
>>>>
>>>> I can see that a natural extension would be to allow filters to be 
>>>> used as targets themselves, maybe through the SHACL Advanced 
>>>> sh:target property.
>>>>
>>>> Håvard
>>>>
>>>>> On 20 May 2020, at 10:29, Andy Seaborne <andy@apache.org 
>>>>> <mailto:andy@apache.org>> wrote:
>>>>>
>>>>> Nice!
>>>>>
>>>>> That would be a useful addition to SHACL both on targets and on 
>>>>> property shapes. And for rules.
>>>>>
>>>>> Were there any other features that got dropped that the community 
>>>>> might be interested in?
>>>>>
>>>>>   Andy
>>>>>
>>>>>
>>>>>>> On 19/05/2020 22:29, Dimitris Kontokostas wrote:
>>>>>> Hi Håvard,
>>>>>> I think what you are after is something like the filter shape 
>>>>>> feature
>>>>>> (https://nam12.safelinks.protection.outlook.com/?url=https%3A%2F%2Fww
>>>>>> w.w3.org 
>>>>>> <http://w.w3.org>%2FTR%2F2016%2FWD-shacl-201608H%25C3%25A5vard14%2F%23filterSh
>>>>>> ape&amp;data=02%7C01%7CNatasa.Varytimou%40refinitiv.com 
>>>>>> <http://40refinitiv.com>%7Ce6dc14407b7
>>>>>> 34d54b27908d7fca8327d%7C71ad2f6261e244fc9e8586c2827f6de9%7C0%7C0%7C63
>>>>>> 7255671647244813&amp;sdata=9fkMStQVgoP4f8k3OKo4gaq4uampFgxMYbXuPjSH4q
>>>>>> A%3D&amp;reserved=0
>>>>>> <https://nam12.safelinks.protection.outlook.com/?url=https%3A%2F%2Fww
>>>>>> w.w3.org 
>>>>>> <http://w.w3.org>%2FTR%2F2016%2FWD-shacl-20160814%2F%23filterShape&amp;data=02
>>>>>> %7C01%7CNatasa.Varytimou%40refinitiv.com 
>>>>>> <http://40refinitiv.com>%7Ce6dc14407b734d54b27908d7fc
>>>>>> a8327d%7C71ad2f6261e244fc9e8586c2827f6de9%7C0%7C0%7C63725567164724481
>>>>>> 3&amp;sdata=dVeAUe7hfKbnliGxy5KVlNTE5Zs%2BE3f81z2GX%2BYFQfc%3D&amp;re
>>>>>> served=0>) This is something that existed in the first versions of
>>>>>> SHACL but was dropped due to time restrictions near the end of 
>>>>>> the WG
>>>>>> Best, Dimitris
>>>>>>> On Tue, May 19, 2020 at 11:05 PM Håvard Ottestad 
>>>>>>> <hmottestad@gmail.com <mailto:hmottestad@gmail.com> 
>>>>>>> <mailto:hmottestad@gmail.com>> wrote:
>>>>>>  Hi James and Irene,
>>>>>>  Thanks for the replies.
>>>>>>  This is more a question of the standardisation aspect. Did anyone
>>>>>>  discus including more elaborate target building blocks? There is
>>>>>>  already sh:targetClass for rdf:type, but did anyone consider other
>>>>>>  class constructs like skos:inScheme?
>>>>>>  We already have two functional solutions within the current syntax:
>>>>>>   - use sh:targetNode with sh:inverseProperty
>>>>>>   - use SPARQL targets
>>>>>>  The issue with these solutions are:
>>>>>>  1. Using sh:targetNode and sh:inverseProperty are much harder to
>>>>>>  read than something like the compound target that we we
>>>>>>  considering introducing.
>>>>>>  2. SPARQL targets take took long to evaluate for transactional
>>>>>>  workloads.
>>>>>>  Håvard
>>>>>>>  On 19 May 2020, at 20:37, James Hudson 
>>>>>>> <jameshudson3010@gmail.com <mailto:jameshudson3010@gmail.com>
>>>>>>>  <mailto:jameshudson3010@gmail.com>> wrote:
>>>>>>>  You may want to check out:
>>>>>>> https://nam12.safelinks.protection.outlook.com/?url=https%3A%2F%2Fstackoverflow.com%2Fquestions%2F61323857%2Fwhat-is-the-difference-between-these-shape-graphs-which-use-shor&amp;data=02%7C01%7CNatasa.Varytimou%40refinitiv.com%7Ce6dc14407b734d54b27908d7fca8327d%7C71ad2f6261e244fc9e8586c2827f6de9%7C0%7C0%7C637255671647244813&amp;sdata=kjaTfSE9gI524M8kypS5LzuVtajKVemL7vMOWxlHfEw%3D&amp;reserved=0
>>>>>>>  and
>>>>>>> https://nam12.safelinks.protection.outlook.com/?url=https%3A%2F%2Fstackoverflow.com%2Fquestions%2F61190422%2Fvalidating-that-every-subject-has-a-type-of-class&amp;data=02%7C01%7CNatasa.Varytimou%40refinitiv.com%7Ce6dc14407b734d54b27908d7fca8327d%7C71ad2f6261e244fc9e8586c2827f6de9%7C0%7C0%7C637255671647244813&amp;sdata=skAZd%2BVgGSCgD%2B5wEuPAuRV7XFUEcOlBJ78Ol0oWcGs%3D&amp;reserved=0
>>>>>>>  and other SHACL questions and answers I have on SO. They may help
>>>>>>>  you out.
>>>>>>>  As Irene already pointed out, SPARQL-based targets will solve
>>>>>>>  your problem.
>>>>>>>  On Tue, May 19, 2020 at 11:39 AM Håvard Ottestad
>>>>>>>  <hmottestad@gmail.com <mailto:hmottestad@gmail.com> 
>>>>>>> <mailto:hmottestad@gmail.com>> wrote:
>>>>>>>      Hi,
>>>>>>>      I’m the developer for the RDF4J SHACL implementation and we
>>>>>>>      are looking into extending the targeting options in SHACL and
>>>>>>>      are wondering if this is something that was discussed during
>>>>>>>      the development of the standard or if anyone else has run
>>>>>>>      into similar requirements.
>>>>>>>      Essentially extending the current list of sh:targetNode,
>>>>>>>      sh:targetClass, sh:targetSubjectsOf and sh:targetObjectsOf.
>>>>>>>      Our use case can be summed up as.
>>>>>>>      ex:Håvard ex:nationality ex:Norway;
>>>>>>>          ex:norwegianID “12345612345”.
>>>>>>>      Where we would essentially like to be able to add a shape
>>>>>>>      that says that all Norwegian citizens should have a Norwegian
>>>>>>>      ID number.
>>>>>>>      We have been testing out the concept of a compound target.
>>>>>>>      For our current tests we have used our own namespace like this:
>>>>>>>      @prefix rdf4j-sh: 
>>>>>>> <https://nam12.safelinks.protection.outlook.com/?url=http%3A%2F%2Frdf4j.org%2Fschema%2Frdf4j-shacl%23&amp;data=02%7C01%7CNatasa.Varytimou%40refinitiv.com%7Ce6dc14407b734d54b27908d7fca8327d%7C71ad2f6261e244fc9e8586c2827f6de9%7C0%7C0%7C637255671647244813&amp;sdata=ZmKYvssWhaW30oRKEkRqDpK6%2FizYr8tDe8xaPfdqvPc%3D&amp;reserved=0> 
>>>>>>> .
>>>>>>>      ex:PersonShape
>>>>>>>             a sh:NodeShape  ;
>>>>>>>             rdf4j-sh:compoundTarget [
>>>>>>>                     rdf4j-sh:targetPredicate ex:nationality;
>>>>>>>                     rdf4j-sh:targetObject ex:Norway
>>>>>>>             ];
>>>>>>>             sh:property [
>>>>>>>                    sh:path ex:norwegianID ;
>>>>>>>                    sh:minCount 1 ;
>>>>>>>                    sh:maxCount 1 ;
>>>>>>>             ] .
>>>>>>>      We have also been thinking about allowing
>>>>>>>      rdf4j-sh:targetObject to be have multiple values.
>>>>>>>      I also realise that it’s possible to use inversePath to solve
>>>>>>>      this same problem, but I feel it becomes hard to read and
>>>>>>>      grasp the intent.
>>>>>>>      ex:PersonShape
>>>>>>>             a sh:NodeShape  ;
>>>>>>>             sh:targetNode ex:Norway;
>>>>>>>             sh:property [
>>>>>>>                    sh:path [sh:inversePath ex:nationality ];
>>>>>>>                    sh:property [
>>>>>>>                      sh:path ex:norwegianID ;
>>>>>>>                      sh:minCount 1 ;
>>>>>>>                      sh:maxCount 1 ;
>>>>>>>                    ]
>>>>>>>             ] .
>>>>>>>      Concurrently we have been testing the SHACL Advanced SPARQL
>>>>>>>      targets. These allow us to do the same thing, but we are
>>>>>>>      unable to achieve the same level of performance. In one of
>>>>>>>      our benchmarks we see that SPARQL targets is 450x slower per
>>>>>>>      transaction than compound targets. This is mostly due to our
>>>>>>>      SHACL implementation being able to analyse the transactional
>>>>>>>      changes and run a very minimal validation for compound
>>>>>>>      targets. We do think that SPARQL targets could be
>>>>>>>      considerably faster, but the design choices that allow for
>>>>>>>      minimal transactional validation are currently also limiting
>>>>>>>      our options for speeding up SPARQL targets.
>>>>>>>      Does anyone know if this approach to a more flexible
>>>>>>>      targeting has been considered as part of the spec? Or if
>>>>>>>      someone has run into similar needs and is maybe considering
>>>>>>>      implementing something similar.
>>>>>>>      Cheers,
>>>>>>>      Håvard
>>>>>> --
>>>>>> Kontokostas Dimitris
>>>>
>>>>
>>>
Received on Friday, 22 May 2020 05:05:14 UTC