Re: Reopening the discussion on sh:targetShape from Irene Polikoff on 2020-07-06 (public-shacl@w3.org from July 2020)

From: Irene Polikoff <irene@topquadrant.com>
Date: Mon, 6 Jul 2020 10:59:02 -0400
To: Håvard Ottestad <hmottestad@gmail.com>
Cc: Public Shacl W3C <public-shacl@w3.org>, Holger Knublauch <holger@topquadrant.com>
Message-Id: <16D0B766-9A37-481D-8458-E81559D8C136@topquadrant.com>
I can’t really comment on the implementation considerations. 

A concern I have is about the fact that this significantly (fundamentally?) changes the notion of a target declaration and, more generally, identification of the focus nodes. A SHACL shape does not do anything unless there are known focus nodes that must be evaluated against a shape. Focus nodes are specified using one of the following methods:

The set of focus nodes <https://www.w3.org/TR/shacl/#dfn-focus-node> for a shape <https://www.w3.org/TR/shacl/#dfn-shape> may be identified as follows:

specified in a shape <https://www.w3.org/TR/shacl/#dfn-shape> using target declarations <https://www.w3.org/TR/shacl/#dfn-target-declarations>
specified in any constraint <https://www.w3.org/TR/shacl/#dfn-constraint> that references a shape <https://www.w3.org/TR/shacl/#dfn-shape> in parameters of shape-expecting constraint parameters <https://www.w3.org/TR/shacl/#dfn-shape-expecting-constraint-parameters> (e.g. sh:node)
specified as explicit input to the SHACL processor for validating a specific RDF term against a shape

With a targetShape, one seems to first need to do SHACL validation in order to find the targets to do SHACL validation. To date, all target declarations have been separate from the applying SHACL Shapes. Further, if values of sh:targetShape are shapes, then how are the focus nodes for these shapes identified? It seems to require a new mechanism and, thus, has implications for the standardized SHACL specification. 

At minimum, this needs to be addressed explicitly in any change to the specification and can’t be simply left to examples. For example, one could say that if a  shape is a value of sh:targetShape, then all RDF terms in the data graph are its focus nodes.  


Regards,

Irene


> On Jul 6, 2020, at 10:19 AM, Håvard Ottestad <hmottestad@gmail.com> wrote:
> 
> In my haste to respond I can see that I forgot to account for the sh:class constraint in example 2. This would either amount to a query with a union or two separate queries (one for sh:in and one for sh:class).
> 
> Håvard
> 
> On Mon, Jul 6, 2020 at 4:05 PM Håvard Ottestad <hmottestad@gmail.com <mailto:hmottestad@gmail.com>> wrote:
> Hi Holger,
> 
> I've typed up some queries I believe will validate examples 2, 3 and 4 at least as efficiently as a SPARQL target.
> 
> If you agree that these queries are correct (or close enough to being correct), then we know that all those examples at least can be implemented at least as performant as SPARQL targets.
> 
> If there is a way to evaluate all target shapes that is as fast or faster than using SPARQL targets then I think that sh:targetShape should be considered on the same terms as SPARQL targets performance wise, regardless of the poor performance you are currently seeing. 
> 
> I can think of one consideration that would either not perform well, or at least be very difficult to implement, and that is recursive SHACL since SPARQL does not support recursion. I believe that this should not hold us back from considering sh:targetShape.
> 
> Here is the list of queries.
> 
> ###############################################################################
> # 2. Instances in namespace "company" must have appropriate class and dc:type #
> ###############################################################################
> ex:CompanyShape a sh:NodeShape;
>   sh:targetShape [
>     sh:nodeKind sh:IRI;
>     sh:pattern "^https://company-graph.example.com/resource/company/ <https://company-graph.example.com/resource/company/>";
>   ];
>   sh:class ex:Company;
>   sh:property [sh:path dc:type; sh:in ("conglomerate" "collective" "enterprise")];
> .
> 
> ### Target query ###
> select ?target where {
>  {
>   ?target ?b ?c.
>   FILTER(isIRI(?target) && regex(str(?target), "^https://company-graph.example.com/resource/company/ <https://company-graph.example.com/resource/company/>") )
>  } union {
>   ?a ?b ?target.
>   FILTER(isIRI(?target) && regex(str(?target), "^https://company-graph.example.com/resource/company/ <https://company-graph.example.com/resource/company/>") )
>  }
> 
> }
> 
> ### Combined query ###
> select ?target ?value where {
>  {
>   ?target ?b ?c.
>   ?target dc:type ?value.
>   FILTER(NOT IN ("conglomerate" "collective" "enterprise"))
>   FILTER(isIRI(?target) && regex(str(?target), "^https://company-graph.example.com/resource/company/ <https://company-graph.example.com/resource/company/>") )
>  } union {
>   ?a ?b ?target.
>   ?target dc:type ?value.
>   FILTER(NOT IN ("conglomerate" "collective" "enterprise"))
>   FILTER(isIRI(?target) && regex(str(?target), "^https://company-graph.example.com/resource/company/ <https://company-graph.example.com/resource/company/>") )
>  }
> 
> }
> 
> ### Optimized combined query for property shapes###
> select ?target ?value where {
>  
>  ?target dc:type ?value.
>  FILTER(NOT IN ("conglomerate" "collective" "enterprise"))
>  FILTER(isIRI(?target) && regex(str(?target), "^https://company-graph.example.com/resource/company/ <https://company-graph.example.com/resource/company/>") )
> 
> }
> 
> 
> #####################################################################
> # 3. All langStrings must have one of a predefined set of languages #
> #####################################################################
> ex:langStringShape a sh:NodeShape;
>   sh:targetShape [sh:datatype rdf:langString];
>   sh:languageIn ("en" "bg");
> .
> 
> 
> ### Target query ###
> select ?target where {
>  ?a ?b ?target.
>  FILTER(DATATYPE(?target) = rdf:langString)
> }
> 
> ### Combined query ###
> select ?target where {
>  ?a ?b ?target.
>  FILTER(DATATYPE(?target) = rdf:langString)
>  FILTER(!langMatches(lang(?title), "en") && !langMatches(lang(?title), "bg"))
> }
> 
> 
> #########################################################################################
> # 4. Steve is very popular, so everyone who knows at least three people must know Steve #
> #########################################################################################
> ex:Personshape a sh:NodeShape;
>   sh:targetShape [sh:path foaf:knows; sh:minCount 3];
>   sh:property [sh:path foaf:knows; sh:hasValue ex:Steve];
> .
> 
> 
> ### Target query ###
> select ?target where {
>  ?target foaf:knows ?count_0, ?count_1, ?count_2.
>  FILTER(?count_0 != ?count_1)
>  FILTER(?count_1 != ?count_2)
>  FILTER(?count_2 != ?count_0)
> }
> 
> ### Combined query ###
> select ?target ?value where {
>  ?target foaf:knows ?count_0, ?count_1, ?count_2.
>  FILTER(?count_0 != ?count_1)
>  FILTER(?count_1 != ?count_2)
>  FILTER(?count_2 != ?count_0)
> 
>  ?target foaf:knows ?value. 
>  FILTER NOT EXISTS {?target foaf:knows ex:Steve}
> }
> 
> 
> Cheers,
> Håvard
> 
> PS. I believe that example 2 is actually a bit wrong, I'll comment on the PR instead of in this email.
> 
> On Mon, Jul 6, 2020 at 1:49 PM Irene Polikoff <irene@topquadrant.com <mailto:irene@topquadrant.com>> wrote:
> But if there is no agreement, then I am concerned about using sh: namespace for this new construct. This does not seem right.
> 
> TQ has been adding some custom constructs into dash: namespace, for example. Other namespaces for custom extensions could be used, as well.
> 
> I believe sh: namespace should be reserved for things that a majority of implementers have reached consensus on. I recognize that currently we only have 2 implementers in this discussion. It would be better to broaden the circle of people looking at this topic.
> 
> If we have a strong disagreement, then the process to follow is normally:
> 
> 1. Have a call (or calls) between the concerned parties to see if they can reach an agreement to come to some compromise.
> 2. If not and the objection is strong ( as in “will not implement”), then typically, the feature would not make it to the spec. Implementers can still do it as a custom extension. 
> 
> In the past, we had some hypothetical and/or philosophical objections that slowed progress by many many months. It was frustrating. An objection based on the implementation experience is, however, different. I believe it needs a serious consideration and a resolution - even if it slows progress. Unfortunately, there is no way to avoid some sluggishness when multiple parties need to come to consensus - this is a downside of standards development.
> 
> Irene
> 
> > On Jul 6, 2020, at 6:06 AM, Holger Knublauch <holger@topquadrant.com <mailto:holger@topquadrant.com>> wrote:
> > 
> >> On 6/07/2020 16:53, Håvard Ottestad wrote:
> >> 
> >> Hi Holger,
> >> 
> >> Could you share the shape that has particularly bad performance so I can see if I can think of an optimal solution?
> > For examples 2,3,4 from https://github.com/w3c/shacl/pull/3 <https://github.com/w3c/shacl/pull/3>
> >> 
> >> My plan has essentially been to convert the targetShape into a sparql query. This would put the performance in the same realm as sparql targets.
> >> 
> >> The benefits of targetShape over sparql targets is that it’s possible to validate the changes to a database efficiently, we are seeing O(c) performance where c is the effective size of the change instead of O(n) which is what we were seeing with sparql targets (where n is the size of the database).
> > 
> > The same algorithms can be applied to the dash:HasValueTarget - it's just as declarative as the other shapes.
> > 
> > I think we (I at least) had discussed this sufficiently. I think I was clear on the performance issues. I think we can agree to disagree and move on. I was hoping that other implementers may have additional input.
> > 
> > Holger
> > 
> > 
> >> 
> >> Håvard
> >> 
> >>> On 6 Jul 2020, at 03:02, Holger Knublauch <holger@topquadrant.com <mailto:holger@topquadrant.com>> wrote:
> >>> 
> >>> There have been various discussions around SHACL target extensions, and there is an open Pull Request https://github.com/w3c/shacl/pull/3 <https://github.com/w3c/shacl/pull/3> to add sh:targetShape as a new target type to SHACL-AF. I have meanwhile attempted to implement that feature in our code base and have concluded that the feature is not a good idea for SHACL-AF (or even SHACL Core). The main argument is still about performance:
> >>> 
> >>> - I stated that the worst-case performance of this general feature is *catastrophic* as it needs to perform validation on all subjects and objects only to determine which nodes it then needs to validate for real. This means that sh:targetShape is very different from the other 4 built-in target types (sh:targetClass, sh:targetNode, sh:targetSubjectsOf, sh:targetObjectsOf) in that it requires validation before validation (which by itself causes implementation complexity).
> >>> 
> >>> - Håvard stated that the alternative, SPARQL-based targets has bad performance for his implementation.
> >>> 
> >>> We do have similar use cases to yours, esp around dependencies across multiple properties. For example:
> >>> 
> >>> IF ex:country=USA THEN ex:state sh:in [ "AZ", "CA", "FL" ... ]
> >>> IF ex:country=AU THEN ex:state sh:in [ "NSW", "VIC", "QLD" ... ]
> >>> 
> >>> We also want a declarative solution that can be used by input forms, so that if the user changes the country then the states drop down list also changes. So relying on SPARQL queries or so wouldn't solve our use cases either.
> >>> 
> >>> The current proposal, based on the new keyword sh:targetShape was
> >>> 
> >>> ex:USAStateShape
> >>>     sh:targetShape [
> >>>         a sh:PropertyShape ;
> >>>         sh:path ex:country ;
> >>>         sh:hasValue ex:USA ;
> >>>     ] ;
> >>>     sh:property [
> >>>         sh:path sh:state ;
> >>>         sh:in [ "AZ" "CA" "FL" ... ]
> >>>     ] .
> >>> 
> >>> I believe the following is better overall:
> >>> 
> >>> ex:USAStateShape
> >>>     sh:target [
> >>>         a dash:HasValueTarget ;
> >>>         dash:predicate ex:country ;
> >>>         dash:object ex:USA ;
> >>>     ] ;
> >>>     sh:property [
> >>>         sh:path sh:state ;
> >>>         sh:in [ "AZ" "CA" "FL" ... ]
> >>>     ] .
> >>> 
> >>> Where dash:HasValueTarget is a SPARQL-based Target Type https://w3c.github.io/shacl/shacl-af/#SPARQLTargetType <https://w3c.github.io/shacl/shacl-af/#SPARQLTargetType>
> >>> 
> >>> Implementations of SHACL-AF already will do the right thing and will be able to do so efficiently. If you cannot use SPARQL efficiently, your platform can simply hard-code this pattern, just like you currently hard-code the common scenarios of the proposed sh:targetShape property to avoid the bad default performance. I expect the difference in your implementation would be marginal, but we neither need to change the spec nor open up SHACL to a feature that is very complex to implement efficiently.
> >>> 
> >>> The downside of using something like dash:HasValueTarget is that it doesn't "cover" all possible use cases. Instead of allowing arbitrary sh:targetShapes we limit this to hasValue patterns. But those hasValue patterns were the main use cases before we brainstormed that "it would be nice" to also support various other shape types (sh:filterShape etc). hasValue patterns are trivial to look up. If anyone needs additional patterns, such as the one from the PR then they can be covered by custom targets which may also get hard-coded by those that cannot use SPARQL.
> >>> 
> >>> BTW the case of http://datashapes.org/constraints.html#HasValueInConstraintComponent <http://datashapes.org/constraints.html#HasValueInConstraintComponent> can be covered by having multiple dash:HasValueTargets with different dash:objects. A bit more verbose but can reuse the same machinery. If you have long lists, introduce your own dash-like extension backed by a SPARQL query and hard-code against that if performance isn't good.
> >>> 
> >>> Sorry for moving back and forth on this topic, but getting hands-on experience with an implementation revealed to me just how bad the sh:targetShape solution would become. And I couldn't schedule time for such an implementation earlier due to other commitments.
> >>> 
> >>> It would be useful to have input from other SHACL implementers (there are about a dozen SHACL engines out there, and counting). We really don't want to rush something through which then becomes a burden for others.
> >>> 
> >>> Holger
> >>> 
> >>> 
> >>> 
> >
Received on Monday, 6 July 2020 14:59:19 UTC