Re: Reopening the discussion on sh:targetShape from Håvard Ottestad on 2020-07-06 (public-shacl@w3.org from July 2020)

From: Håvard Ottestad <hmottestad@gmail.com>
Date: Mon, 6 Jul 2020 16:05:07 +0200
To: Irene Polikoff <irene@topquadrant.com>
Cc: Public Shacl W3C <public-shacl@w3.org>, Holger Knublauch <holger@topquadrant.com>
Message-ID: <CAEKmdN3qs++18JjDY9Ti-DtA92a+z=Om+68hoAUSPJ7EOxS=fQ@mail.gmail.com>
Hi Holger,

I've typed up some queries I believe will validate examples 2, 3 and 4 at
least as efficiently as a SPARQL target.

If you agree that these queries are correct (or close enough to being
correct), then we know that all those examples at least can be implemented
at least as performant as SPARQL targets.

If there is a way to evaluate all target shapes that is as fast or faster
than using SPARQL targets then I think that sh:targetShape should be
considered on the same terms as SPARQL targets performance wise, regardless
of the poor performance you are currently seeing.

I can think of one consideration that would either not perform well, or at
least be very difficult to implement, and that is recursive SHACL since
SPARQL does not support recursion. I believe that this should not hold us
back from considering sh:targetShape.

Here is the list of queries.

###############################################################################
# 2. Instances in namespace "company" must have appropriate class and
dc:type #
###############################################################################
ex:CompanyShape a sh:NodeShape;
  sh:targetShape [
    sh:nodeKind sh:IRI;
    sh:pattern "^https://company-graph.example.com/resource/company/";
  ];
  sh:class ex:Company;
  sh:property [sh:path dc:type; sh:in ("conglomerate" "collective"
"enterprise")];
.

### Target query ###
select ?target where {
{
?target ?b ?c.
FILTER(isIRI(?target) && regex(str(?target), "^
https://company-graph.example.com/resource/company/") )
} union {
?a ?b ?target.
FILTER(isIRI(?target) && regex(str(?target), "^
https://company-graph.example.com/resource/company/") )
}

}

### Combined query ###
select ?target ?value where {
{
?target ?b ?c.
?target dc:type ?value.
FILTER(NOT IN ("conglomerate" "collective" "enterprise"))
FILTER(isIRI(?target) && regex(str(?target), "^
https://company-graph.example.com/resource/company/") )
} union {
?a ?b ?target.
?target dc:type ?value.
FILTER(NOT IN ("conglomerate" "collective" "enterprise"))
FILTER(isIRI(?target) && regex(str(?target), "^
https://company-graph.example.com/resource/company/") )
}

}

### Optimized combined query for property shapes###
select ?target ?value where {
?target dc:type ?value.
FILTER(NOT IN ("conglomerate" "collective" "enterprise"))
FILTER(isIRI(?target) && regex(str(?target), "^
https://company-graph.example.com/resource/company/") )

}


#####################################################################
# 3. All langStrings must have one of a predefined set of languages #
#####################################################################
ex:langStringShape a sh:NodeShape;
  sh:targetShape [sh:datatype rdf:langString];
  sh:languageIn ("en" "bg");
.


### Target query ###
select ?target where {
?a ?b ?target.
FILTER(DATATYPE(?target) = rdf:langString)
}

### Combined query ###
select ?target where {
?a ?b ?target.
FILTER(DATATYPE(?target) = rdf:langString)
FILTER(!langMatches(lang(?title), "en") && !langMatches(lang(?title), "bg"))
}


#########################################################################################
# 4. Steve is very popular, so everyone who knows at least three people
must know Steve #
#########################################################################################
ex:Personshape a sh:NodeShape;
  sh:targetShape [sh:path foaf:knows; sh:minCount 3];
  sh:property [sh:path foaf:knows; sh:hasValue ex:Steve];
.


### Target query ###
select ?target where {
?target foaf:knows ?count_0, ?count_1, ?count_2.
FILTER(?count_0 != ?count_1)
FILTER(?count_1 != ?count_2)
FILTER(?count_2 != ?count_0)
}

### Combined query ###
select ?target ?value where {
?target foaf:knows ?count_0, ?count_1, ?count_2.
FILTER(?count_0 != ?count_1)
FILTER(?count_1 != ?count_2)
FILTER(?count_2 != ?count_0)

?target foaf:knows ?value.
FILTER NOT EXISTS {?target foaf:knows ex:Steve}
}


Cheers,
Håvard

PS. I believe that example 2 is actually a bit wrong, I'll comment on the
PR instead of in this email.

On Mon, Jul 6, 2020 at 1:49 PM Irene Polikoff <irene@topquadrant.com> wrote:

> But if there is no agreement, then I am concerned about using sh:
> namespace for this new construct. This does not seem right.
>
> TQ has been adding some custom constructs into dash: namespace, for
> example. Other namespaces for custom extensions could be used, as well.
>
> I believe sh: namespace should be reserved for things that a majority of
> implementers have reached consensus on. I recognize that currently we only
> have 2 implementers in this discussion. It would be better to broaden the
> circle of people looking at this topic.
>
> If we have a strong disagreement, then the process to follow is normally:
>
> 1. Have a call (or calls) between the concerned parties to see if they can
> reach an agreement to come to some compromise.
> 2. If not and the objection is strong ( as in “will not implement”), then
> typically, the feature would not make it to the spec. Implementers can
> still do it as a custom extension.
>
> In the past, we had some hypothetical and/or philosophical objections that
> slowed progress by many many months. It was frustrating. An objection based
> on the implementation experience is, however, different. I believe it needs
> a serious consideration and a resolution - even if it slows progress.
> Unfortunately, there is no way to avoid some sluggishness when multiple
> parties need to come to consensus - this is a downside of standards
> development.
>
> Irene
>
> > On Jul 6, 2020, at 6:06 AM, Holger Knublauch <holger@topquadrant.com>
> wrote:
> >
> >> On 6/07/2020 16:53, Håvard Ottestad wrote:
> >>
> >> Hi Holger,
> >>
> >> Could you share the shape that has particularly bad performance so I
> can see if I can think of an optimal solution?
> > For examples 2,3,4 from https://github.com/w3c/shacl/pull/3
> >>
> >> My plan has essentially been to convert the targetShape into a sparql
> query. This would put the performance in the same realm as sparql targets.
> >>
> >> The benefits of targetShape over sparql targets is that it’s possible
> to validate the changes to a database efficiently, we are seeing O(c)
> performance where c is the effective size of the change instead of O(n)
> which is what we were seeing with sparql targets (where n is the size of
> the database).
> >
> > The same algorithms can be applied to the dash:HasValueTarget - it's
> just as declarative as the other shapes.
> >
> > I think we (I at least) had discussed this sufficiently. I think I was
> clear on the performance issues. I think we can agree to disagree and move
> on. I was hoping that other implementers may have additional input.
> >
> > Holger
> >
> >
> >>
> >> Håvard
> >>
> >>> On 6 Jul 2020, at 03:02, Holger Knublauch <holger@topquadrant.com>
> wrote:
> >>>
> >>> There have been various discussions around SHACL target extensions,
> and there is an open Pull Request https://github.com/w3c/shacl/pull/3 to
> add sh:targetShape as a new target type to SHACL-AF. I have meanwhile
> attempted to implement that feature in our code base and have concluded
> that the feature is not a good idea for SHACL-AF (or even SHACL Core). The
> main argument is still about performance:
> >>>
> >>> - I stated that the worst-case performance of this general feature is
> *catastrophic* as it needs to perform validation on all subjects and
> objects only to determine which nodes it then needs to validate for real.
> This means that sh:targetShape is very different from the other 4 built-in
> target types (sh:targetClass, sh:targetNode, sh:targetSubjectsOf,
> sh:targetObjectsOf) in that it requires validation before validation (which
> by itself causes implementation complexity).
> >>>
> >>> - Håvard stated that the alternative, SPARQL-based targets has bad
> performance for his implementation.
> >>>
> >>> We do have similar use cases to yours, esp around dependencies across
> multiple properties. For example:
> >>>
> >>> IF ex:country=USA THEN ex:state sh:in [ "AZ", "CA", "FL" ... ]
> >>> IF ex:country=AU THEN ex:state sh:in [ "NSW", "VIC", "QLD" ... ]
> >>>
> >>> We also want a declarative solution that can be used by input forms,
> so that if the user changes the country then the states drop down list also
> changes. So relying on SPARQL queries or so wouldn't solve our use cases
> either.
> >>>
> >>> The current proposal, based on the new keyword sh:targetShape was
> >>>
> >>> ex:USAStateShape
> >>>     sh:targetShape [
> >>>         a sh:PropertyShape ;
> >>>         sh:path ex:country ;
> >>>         sh:hasValue ex:USA ;
> >>>     ] ;
> >>>     sh:property [
> >>>         sh:path sh:state ;
> >>>         sh:in [ "AZ" "CA" "FL" ... ]
> >>>     ] .
> >>>
> >>> I believe the following is better overall:
> >>>
> >>> ex:USAStateShape
> >>>     sh:target [
> >>>         a dash:HasValueTarget ;
> >>>         dash:predicate ex:country ;
> >>>         dash:object ex:USA ;
> >>>     ] ;
> >>>     sh:property [
> >>>         sh:path sh:state ;
> >>>         sh:in [ "AZ" "CA" "FL" ... ]
> >>>     ] .
> >>>
> >>> Where dash:HasValueTarget is a SPARQL-based Target Type
> https://w3c.github.io/shacl/shacl-af/#SPARQLTargetType
> >>>
> >>> Implementations of SHACL-AF already will do the right thing and will
> be able to do so efficiently. If you cannot use SPARQL efficiently, your
> platform can simply hard-code this pattern, just like you currently
> hard-code the common scenarios of the proposed sh:targetShape property to
> avoid the bad default performance. I expect the difference in your
> implementation would be marginal, but we neither need to change the spec
> nor open up SHACL to a feature that is very complex to implement
> efficiently.
> >>>
> >>> The downside of using something like dash:HasValueTarget is that it
> doesn't "cover" all possible use cases. Instead of allowing arbitrary
> sh:targetShapes we limit this to hasValue patterns. But those hasValue
> patterns were the main use cases before we brainstormed that "it would be
> nice" to also support various other shape types (sh:filterShape etc).
> hasValue patterns are trivial to look up. If anyone needs additional
> patterns, such as the one from the PR then they can be covered by custom
> targets which may also get hard-coded by those that cannot use SPARQL.
> >>>
> >>> BTW the case of
> http://datashapes.org/constraints.html#HasValueInConstraintComponent can
> be covered by having multiple dash:HasValueTargets with different
> dash:objects. A bit more verbose but can reuse the same machinery. If you
> have long lists, introduce your own dash-like extension backed by a SPARQL
> query and hard-code against that if performance isn't good.
> >>>
> >>> Sorry for moving back and forth on this topic, but getting hands-on
> experience with an implementation revealed to me just how bad the
> sh:targetShape solution would become. And I couldn't schedule time for such
> an implementation earlier due to other commitments.
> >>>
> >>> It would be useful to have input from other SHACL implementers (there
> are about a dozen SHACL engines out there, and counting). We really don't
> want to rush something through which then becomes a burden for others.
> >>>
> >>> Holger
> >>>
> >>>
> >>>
> >
>
Received on Monday, 6 July 2020 14:05:32 UTC