Re: Reopening the discussion on sh:targetShape from Holger Knublauch on 2020-07-06 (public-shacl@w3.org from July 2020)

From: Holger Knublauch <holger@topquadrant.com>
Date: Tue, 7 Jul 2020 08:58:02 +1000
To: Public Shacl W3C <public-shacl@w3.org>
Message-ID: <2edca38a-0417-06c7-3a78-88098c868a1d@topquadrant.com>
On 7/07/2020 00:05, Håvard Ottestad wrote:
> Hi Holger,
>
> I've typed up some queries I believe will validate examples 2, 3 and 4 
> at least as efficiently as a SPARQL target.

Sure, but that is not the question. The problem with sh:targetShape is 
that it requires to visit all subjects and objects in the whole graph. 
It doesn't matter then whether you execute, say, the sh:pattern as 
SPARQL regex or natively implemented. It is still a horribly slow loop 
that won't scale. The only cases where it will scale are the few 
constraint types that avoid the iteration (e.g. hasValue) and of course 
filtering out nodes if you already have a small set of candidates, e.g. 
from a change set in a transaction.

This is a show stopper for the whole generalized concept of target 
shapes which is why I suggest to limit the use cases that this can cover 
to hasValue scenarios that are efficient either way. As soon as we offer 
a too-powerful but slow feature, users will run into performance issues 
that will backfire on SHACL in general. Let's better be conservative 
with what we put into the language.

Holger


>
> If you agree that these queries are correct (or close enough to being 
> correct), then we know that all those examples at least can be 
> implemented at least as performant as SPARQL targets.
>
> If there is a way to evaluate all target shapes that is as fast or 
> faster than using SPARQL targets then I think that sh:targetShape 
> should be considered on the same terms as SPARQL targets performance 
> wise, regardless of the poor performance you are currently seeing.
>
> I can think of one consideration that would either not perform well, 
> or at least be very difficult to implement, and that is recursive 
> SHACL since SPARQL does not support recursion. I believe that this 
> should not hold us back from considering sh:targetShape.
>
> Here is the list of queries.
>
> ###############################################################################
> # 2. Instances in namespace "company" must have appropriate class and 
> dc:type #
> ###############################################################################
> ex:CompanyShape a sh:NodeShape;
>   sh:targetShape [
>     sh:nodeKind sh:IRI;
>     sh:pattern "^https://company-graph.example.com/resource/company/";
>   ];
>   sh:class ex:Company;
>   sh:property [sh:path dc:type; sh:in ("conglomerate" "collective" 
> "enterprise")];
> .
>
> ### Target query ###
> select ?target where {
> {
> ?target ?b ?c.
> FILTER(isIRI(?target) && regex(str(?target), 
> "^https://company-graph.example.com/resource/company/") )
> } union {
> ?a ?b ?target.
> FILTER(isIRI(?target) && regex(str(?target), 
> "^https://company-graph.example.com/resource/company/") )
> }
>
> }
>
> ### Combined query ###
> select ?target ?value where {
> {
> ?target ?b ?c.
> ?target dc:type ?value.
> FILTER(NOT IN ("conglomerate" "collective" "enterprise"))
> FILTER(isIRI(?target) && regex(str(?target), 
> "^https://company-graph.example.com/resource/company/") )
> } union {
> ?a ?b ?target.
> ?target dc:type ?value.
> FILTER(NOT IN ("conglomerate" "collective" "enterprise"))
> FILTER(isIRI(?target) && regex(str(?target), 
> "^https://company-graph.example.com/resource/company/") )
> }
>
> }
>
> ### Optimized combined query for property shapes###
> select ?target ?value where {
> ?target dc:type ?value.
> FILTER(NOT IN ("conglomerate" "collective" "enterprise"))
> FILTER(isIRI(?target) && regex(str(?target), 
> "^https://company-graph.example.com/resource/company/") )
>
> }
>
>
> #####################################################################
> # 3. All langStrings must have one of a predefined set of languages #
> #####################################################################
> ex:langStringShape a sh:NodeShape;
>   sh:targetShape [sh:datatype rdf:langString];
>   sh:languageIn ("en" "bg");
> .
>
>
> ### Target query ###
> select ?target where {
> ?a ?b ?target.
> FILTER(DATATYPE(?target) = rdf:langString)
> }
>
> ### Combined query ###
> select ?target where {
> ?a ?b ?target.
> FILTER(DATATYPE(?target) = rdf:langString)
> FILTER(!langMatches(lang(?title), "en") && !langMatches(lang(?title), 
> "bg"))
> }
>
>
> #########################################################################################
> # 4. Steve is very popular, so everyone who knows at least three 
> people must know Steve #
> #########################################################################################
> ex:Personshape a sh:NodeShape;
>   sh:targetShape [sh:path foaf:knows; sh:minCount 3];
>   sh:property [sh:path foaf:knows; sh:hasValue ex:Steve];
> .
>
>
> ### Target query ###
> select ?target where {
> ?target foaf:knows ?count_0, ?count_1, ?count_2.
> FILTER(?count_0 != ?count_1)
> FILTER(?count_1 != ?count_2)
> FILTER(?count_2 != ?count_0)
> }
>
> ### Combined query ###
> select ?target ?value where {
> ?target foaf:knows ?count_0, ?count_1, ?count_2.
> FILTER(?count_0 != ?count_1)
> FILTER(?count_1 != ?count_2)
> FILTER(?count_2 != ?count_0)
>
> ?target foaf:knows ?value.
> FILTER NOT EXISTS {?target foaf:knows ex:Steve}
> }
>
>
> Cheers,
> Håvard
>
> PS. I believe that example 2 is actually a bit wrong, I'll comment on 
> the PR instead of in this email.
>
> On Mon, Jul 6, 2020 at 1:49 PM Irene Polikoff <irene@topquadrant.com 
> <mailto:irene@topquadrant.com>> wrote:
>
>     But if there is no agreement, then I am concerned about using sh:
>     namespace for this new construct. This does not seem right.
>
>     TQ has been adding some custom constructs into dash: namespace,
>     for example. Other namespaces for custom extensions could be used,
>     as well.
>
>     I believe sh: namespace should be reserved for things that a
>     majority of implementers have reached consensus on. I recognize
>     that currently we only have 2 implementers in this discussion. It
>     would be better to broaden the circle of people looking at this topic.
>
>     If we have a strong disagreement, then the process to follow is
>     normally:
>
>     1. Have a call (or calls) between the concerned parties to see if
>     they can reach an agreement to come to some compromise.
>     2. If not and the objection is strong ( as in “will not
>     implement”), then typically, the feature would not make it to the
>     spec. Implementers can still do it as a custom extension.
>
>     In the past, we had some hypothetical and/or philosophical
>     objections that slowed progress by many many months. It was
>     frustrating. An objection based on the implementation experience
>     is, however, different. I believe it needs a serious consideration
>     and a resolution - even if it slows progress. Unfortunately, there
>     is no way to avoid some sluggishness when multiple parties need to
>     come to consensus - this is a downside of standards development.
>
>     Irene
>
>     > On Jul 6, 2020, at 6:06 AM, Holger Knublauch
>     <holger@topquadrant.com <mailto:holger@topquadrant.com>> wrote:
>     >
>     >> On 6/07/2020 16:53, Håvard Ottestad wrote:
>     >>
>     >> Hi Holger,
>     >>
>     >> Could you share the shape that has particularly bad performance
>     so I can see if I can think of an optimal solution?
>     > For examples 2,3,4 from https://github.com/w3c/shacl/pull/3
>     >>
>     >> My plan has essentially been to convert the targetShape into a
>     sparql query. This would put the performance in the same realm as
>     sparql targets.
>     >>
>     >> The benefits of targetShape over sparql targets is that it’s
>     possible to validate the changes to a database efficiently, we are
>     seeing O(c) performance where c is the effective size of the
>     change instead of O(n) which is what we were seeing with sparql
>     targets (where n is the size of the database).
>     >
>     > The same algorithms can be applied to the dash:HasValueTarget -
>     it's just as declarative as the other shapes.
>     >
>     > I think we (I at least) had discussed this sufficiently. I think
>     I was clear on the performance issues. I think we can agree to
>     disagree and move on. I was hoping that other implementers may
>     have additional input.
>     >
>     > Holger
>     >
>     >
>     >>
>     >> Håvard
>     >>
>     >>> On 6 Jul 2020, at 03:02, Holger Knublauch
>     <holger@topquadrant.com <mailto:holger@topquadrant.com>> wrote:
>     >>>
>     >>> There have been various discussions around SHACL target
>     extensions, and there is an open Pull Request
>     https://github.com/w3c/shacl/pull/3 to add sh:targetShape as a new
>     target type to SHACL-AF. I have meanwhile attempted to implement
>     that feature in our code base and have concluded that the feature
>     is not a good idea for SHACL-AF (or even SHACL Core). The main
>     argument is still about performance:
>     >>>
>     >>> - I stated that the worst-case performance of this general
>     feature is *catastrophic* as it needs to perform validation on all
>     subjects and objects only to determine which nodes it then needs
>     to validate for real. This means that sh:targetShape is very
>     different from the other 4 built-in target types (sh:targetClass,
>     sh:targetNode, sh:targetSubjectsOf, sh:targetObjectsOf) in that it
>     requires validation before validation (which by itself causes
>     implementation complexity).
>     >>>
>     >>> - Håvard stated that the alternative, SPARQL-based targets has
>     bad performance for his implementation.
>     >>>
>     >>> We do have similar use cases to yours, esp around dependencies
>     across multiple properties. For example:
>     >>>
>     >>> IF ex:country=USA THEN ex:state sh:in [ "AZ", "CA", "FL" ... ]
>     >>> IF ex:country=AU THEN ex:state sh:in [ "NSW", "VIC", "QLD" ... ]
>     >>>
>     >>> We also want a declarative solution that can be used by input
>     forms, so that if the user changes the country then the states
>     drop down list also changes. So relying on SPARQL queries or so
>     wouldn't solve our use cases either.
>     >>>
>     >>> The current proposal, based on the new keyword sh:targetShape was
>     >>>
>     >>> ex:USAStateShape
>     >>>     sh:targetShape [
>     >>>         a sh:PropertyShape ;
>     >>>         sh:path ex:country ;
>     >>>         sh:hasValue ex:USA ;
>     >>>     ] ;
>     >>>     sh:property [
>     >>>         sh:path sh:state ;
>     >>>         sh:in [ "AZ" "CA" "FL" ... ]
>     >>>     ] .
>     >>>
>     >>> I believe the following is better overall:
>     >>>
>     >>> ex:USAStateShape
>     >>>     sh:target [
>     >>>         a dash:HasValueTarget ;
>     >>>         dash:predicate ex:country ;
>     >>>         dash:object ex:USA ;
>     >>>     ] ;
>     >>>     sh:property [
>     >>>         sh:path sh:state ;
>     >>>         sh:in [ "AZ" "CA" "FL" ... ]
>     >>>     ] .
>     >>>
>     >>> Where dash:HasValueTarget is a SPARQL-based Target Type
>     https://w3c.github.io/shacl/shacl-af/#SPARQLTargetType
>     >>>
>     >>> Implementations of SHACL-AF already will do the right thing
>     and will be able to do so efficiently. If you cannot use SPARQL
>     efficiently, your platform can simply hard-code this pattern, just
>     like you currently hard-code the common scenarios of the proposed
>     sh:targetShape property to avoid the bad default performance. I
>     expect the difference in your implementation would be marginal,
>     but we neither need to change the spec nor open up SHACL to a
>     feature that is very complex to implement efficiently.
>     >>>
>     >>> The downside of using something like dash:HasValueTarget is
>     that it doesn't "cover" all possible use cases. Instead of
>     allowing arbitrary sh:targetShapes we limit this to hasValue
>     patterns. But those hasValue patterns were the main use cases
>     before we brainstormed that "it would be nice" to also support
>     various other shape types (sh:filterShape etc). hasValue patterns
>     are trivial to look up. If anyone needs additional patterns, such
>     as the one from the PR then they can be covered by custom targets
>     which may also get hard-coded by those that cannot use SPARQL.
>     >>>
>     >>> BTW the case of
>     http://datashapes.org/constraints.html#HasValueInConstraintComponent
>     can be covered by having multiple dash:HasValueTargets with
>     different dash:objects. A bit more verbose but can reuse the same
>     machinery. If you have long lists, introduce your own dash-like
>     extension backed by a SPARQL query and hard-code against that if
>     performance isn't good.
>     >>>
>     >>> Sorry for moving back and forth on this topic, but getting
>     hands-on experience with an implementation revealed to me just how
>     bad the sh:targetShape solution would become. And I couldn't
>     schedule time for such an implementation earlier due to other
>     commitments.
>     >>>
>     >>> It would be useful to have input from other SHACL implementers
>     (there are about a dozen SHACL engines out there, and counting).
>     We really don't want to rush something through which then becomes
>     a burden for others.
>     >>>
>     >>> Holger
>     >>>
>     >>>
>     >>>
>     >
>
Received on Monday, 6 July 2020 22:58:30 UTC