Re: Reopening the discussion on sh:targetShape

In my haste to respond I can see that I forgot to account for the sh:class
constraint in example 2. This would either amount to a query with a union
or two separate queries (one for sh:in and one for sh:class).

Håvard

On Mon, Jul 6, 2020 at 4:05 PM Håvard Ottestad <hmottestad@gmail.com> wrote:

> Hi Holger,
>
> I've typed up some queries I believe will validate examples 2, 3 and 4 at
> least as efficiently as a SPARQL target.
>
> If you agree that these queries are correct (or close enough to being
> correct), then we know that all those examples at least can be implemented
> at least as performant as SPARQL targets.
>
> If there is a way to evaluate all target shapes that is as fast or faster
> than using SPARQL targets then I think that sh:targetShape should be
> considered on the same terms as SPARQL targets performance wise, regardless
> of the poor performance you are currently seeing.
>
> I can think of one consideration that would either not perform well, or at
> least be very difficult to implement, and that is recursive SHACL since
> SPARQL does not support recursion. I believe that this should not hold us
> back from considering sh:targetShape.
>
> Here is the list of queries.
>
>
> ###############################################################################
> # 2. Instances in namespace "company" must have appropriate class and
> dc:type #
>
> ###############################################################################
> ex:CompanyShape a sh:NodeShape;
>   sh:targetShape [
>     sh:nodeKind sh:IRI;
>     sh:pattern "^https://company-graph.example.com/resource/company/";
>   ];
>   sh:class ex:Company;
>   sh:property [sh:path dc:type; sh:in ("conglomerate" "collective"
> "enterprise")];
> .
>
> ### Target query ###
> select ?target where {
> {
> ?target ?b ?c.
> FILTER(isIRI(?target) && regex(str(?target), "^
> https://company-graph.example.com/resource/company/") )
> } union {
> ?a ?b ?target.
> FILTER(isIRI(?target) && regex(str(?target), "^
> https://company-graph.example.com/resource/company/") )
> }
>
> }
>
> ### Combined query ###
> select ?target ?value where {
> {
> ?target ?b ?c.
> ?target dc:type ?value.
> FILTER(NOT IN ("conglomerate" "collective" "enterprise"))
> FILTER(isIRI(?target) && regex(str(?target), "^
> https://company-graph.example.com/resource/company/") )
> } union {
> ?a ?b ?target.
> ?target dc:type ?value.
> FILTER(NOT IN ("conglomerate" "collective" "enterprise"))
> FILTER(isIRI(?target) && regex(str(?target), "^
> https://company-graph.example.com/resource/company/") )
> }
>
> }
>
> ### Optimized combined query for property shapes###
> select ?target ?value where {
> ?target dc:type ?value.
> FILTER(NOT IN ("conglomerate" "collective" "enterprise"))
> FILTER(isIRI(?target) && regex(str(?target), "^
> https://company-graph.example.com/resource/company/") )
>
> }
>
>
> #####################################################################
> # 3. All langStrings must have one of a predefined set of languages #
> #####################################################################
> ex:langStringShape a sh:NodeShape;
>   sh:targetShape [sh:datatype rdf:langString];
>   sh:languageIn ("en" "bg");
> .
>
>
> ### Target query ###
> select ?target where {
> ?a ?b ?target.
> FILTER(DATATYPE(?target) = rdf:langString)
> }
>
> ### Combined query ###
> select ?target where {
> ?a ?b ?target.
> FILTER(DATATYPE(?target) = rdf:langString)
> FILTER(!langMatches(lang(?title), "en") && !langMatches(lang(?title),
> "bg"))
> }
>
>
>
> #########################################################################################
> # 4. Steve is very popular, so everyone who knows at least three people
> must know Steve #
>
> #########################################################################################
> ex:Personshape a sh:NodeShape;
>   sh:targetShape [sh:path foaf:knows; sh:minCount 3];
>   sh:property [sh:path foaf:knows; sh:hasValue ex:Steve];
> .
>
>
> ### Target query ###
> select ?target where {
> ?target foaf:knows ?count_0, ?count_1, ?count_2.
> FILTER(?count_0 != ?count_1)
> FILTER(?count_1 != ?count_2)
> FILTER(?count_2 != ?count_0)
> }
>
> ### Combined query ###
> select ?target ?value where {
> ?target foaf:knows ?count_0, ?count_1, ?count_2.
> FILTER(?count_0 != ?count_1)
> FILTER(?count_1 != ?count_2)
> FILTER(?count_2 != ?count_0)
>
> ?target foaf:knows ?value.
> FILTER NOT EXISTS {?target foaf:knows ex:Steve}
> }
>
>
> Cheers,
> Håvard
>
> PS. I believe that example 2 is actually a bit wrong, I'll comment on the
> PR instead of in this email.
>
> On Mon, Jul 6, 2020 at 1:49 PM Irene Polikoff <irene@topquadrant.com>
> wrote:
>
>> But if there is no agreement, then I am concerned about using sh:
>> namespace for this new construct. This does not seem right.
>>
>> TQ has been adding some custom constructs into dash: namespace, for
>> example. Other namespaces for custom extensions could be used, as well.
>>
>> I believe sh: namespace should be reserved for things that a majority of
>> implementers have reached consensus on. I recognize that currently we only
>> have 2 implementers in this discussion. It would be better to broaden the
>> circle of people looking at this topic.
>>
>> If we have a strong disagreement, then the process to follow is normally:
>>
>> 1. Have a call (or calls) between the concerned parties to see if they
>> can reach an agreement to come to some compromise.
>> 2. If not and the objection is strong ( as in “will not implement”), then
>> typically, the feature would not make it to the spec. Implementers can
>> still do it as a custom extension.
>>
>> In the past, we had some hypothetical and/or philosophical objections
>> that slowed progress by many many months. It was frustrating. An objection
>> based on the implementation experience is, however, different. I believe it
>> needs a serious consideration and a resolution - even if it slows progress.
>> Unfortunately, there is no way to avoid some sluggishness when multiple
>> parties need to come to consensus - this is a downside of standards
>> development.
>>
>> Irene
>>
>> > On Jul 6, 2020, at 6:06 AM, Holger Knublauch <holger@topquadrant.com>
>> wrote:
>> >
>> >> On 6/07/2020 16:53, Håvard Ottestad wrote:
>> >>
>> >> Hi Holger,
>> >>
>> >> Could you share the shape that has particularly bad performance so I
>> can see if I can think of an optimal solution?
>> > For examples 2,3,4 from https://github.com/w3c/shacl/pull/3
>> >>
>> >> My plan has essentially been to convert the targetShape into a sparql
>> query. This would put the performance in the same realm as sparql targets.
>> >>
>> >> The benefits of targetShape over sparql targets is that it’s possible
>> to validate the changes to a database efficiently, we are seeing O(c)
>> performance where c is the effective size of the change instead of O(n)
>> which is what we were seeing with sparql targets (where n is the size of
>> the database).
>> >
>> > The same algorithms can be applied to the dash:HasValueTarget - it's
>> just as declarative as the other shapes.
>> >
>> > I think we (I at least) had discussed this sufficiently. I think I was
>> clear on the performance issues. I think we can agree to disagree and move
>> on. I was hoping that other implementers may have additional input.
>> >
>> > Holger
>> >
>> >
>> >>
>> >> Håvard
>> >>
>> >>> On 6 Jul 2020, at 03:02, Holger Knublauch <holger@topquadrant.com>
>> wrote:
>> >>>
>> >>> There have been various discussions around SHACL target extensions,
>> and there is an open Pull Request https://github.com/w3c/shacl/pull/3 to
>> add sh:targetShape as a new target type to SHACL-AF. I have meanwhile
>> attempted to implement that feature in our code base and have concluded
>> that the feature is not a good idea for SHACL-AF (or even SHACL Core). The
>> main argument is still about performance:
>> >>>
>> >>> - I stated that the worst-case performance of this general feature is
>> *catastrophic* as it needs to perform validation on all subjects and
>> objects only to determine which nodes it then needs to validate for real.
>> This means that sh:targetShape is very different from the other 4 built-in
>> target types (sh:targetClass, sh:targetNode, sh:targetSubjectsOf,
>> sh:targetObjectsOf) in that it requires validation before validation (which
>> by itself causes implementation complexity).
>> >>>
>> >>> - Håvard stated that the alternative, SPARQL-based targets has bad
>> performance for his implementation.
>> >>>
>> >>> We do have similar use cases to yours, esp around dependencies across
>> multiple properties. For example:
>> >>>
>> >>> IF ex:country=USA THEN ex:state sh:in [ "AZ", "CA", "FL" ... ]
>> >>> IF ex:country=AU THEN ex:state sh:in [ "NSW", "VIC", "QLD" ... ]
>> >>>
>> >>> We also want a declarative solution that can be used by input forms,
>> so that if the user changes the country then the states drop down list also
>> changes. So relying on SPARQL queries or so wouldn't solve our use cases
>> either.
>> >>>
>> >>> The current proposal, based on the new keyword sh:targetShape was
>> >>>
>> >>> ex:USAStateShape
>> >>>     sh:targetShape [
>> >>>         a sh:PropertyShape ;
>> >>>         sh:path ex:country ;
>> >>>         sh:hasValue ex:USA ;
>> >>>     ] ;
>> >>>     sh:property [
>> >>>         sh:path sh:state ;
>> >>>         sh:in [ "AZ" "CA" "FL" ... ]
>> >>>     ] .
>> >>>
>> >>> I believe the following is better overall:
>> >>>
>> >>> ex:USAStateShape
>> >>>     sh:target [
>> >>>         a dash:HasValueTarget ;
>> >>>         dash:predicate ex:country ;
>> >>>         dash:object ex:USA ;
>> >>>     ] ;
>> >>>     sh:property [
>> >>>         sh:path sh:state ;
>> >>>         sh:in [ "AZ" "CA" "FL" ... ]
>> >>>     ] .
>> >>>
>> >>> Where dash:HasValueTarget is a SPARQL-based Target Type
>> https://w3c.github.io/shacl/shacl-af/#SPARQLTargetType
>> >>>
>> >>> Implementations of SHACL-AF already will do the right thing and will
>> be able to do so efficiently. If you cannot use SPARQL efficiently, your
>> platform can simply hard-code this pattern, just like you currently
>> hard-code the common scenarios of the proposed sh:targetShape property to
>> avoid the bad default performance. I expect the difference in your
>> implementation would be marginal, but we neither need to change the spec
>> nor open up SHACL to a feature that is very complex to implement
>> efficiently.
>> >>>
>> >>> The downside of using something like dash:HasValueTarget is that it
>> doesn't "cover" all possible use cases. Instead of allowing arbitrary
>> sh:targetShapes we limit this to hasValue patterns. But those hasValue
>> patterns were the main use cases before we brainstormed that "it would be
>> nice" to also support various other shape types (sh:filterShape etc).
>> hasValue patterns are trivial to look up. If anyone needs additional
>> patterns, such as the one from the PR then they can be covered by custom
>> targets which may also get hard-coded by those that cannot use SPARQL.
>> >>>
>> >>> BTW the case of
>> http://datashapes.org/constraints.html#HasValueInConstraintComponent can
>> be covered by having multiple dash:HasValueTargets with different
>> dash:objects. A bit more verbose but can reuse the same machinery. If you
>> have long lists, introduce your own dash-like extension backed by a SPARQL
>> query and hard-code against that if performance isn't good.
>> >>>
>> >>> Sorry for moving back and forth on this topic, but getting hands-on
>> experience with an implementation revealed to me just how bad the
>> sh:targetShape solution would become. And I couldn't schedule time for such
>> an implementation earlier due to other commitments.
>> >>>
>> >>> It would be useful to have input from other SHACL implementers (there
>> are about a dozen SHACL engines out there, and counting). We really don't
>> want to rush something through which then becomes a burden for others.
>> >>>
>> >>> Holger
>> >>>
>> >>>
>> >>>
>> >
>>
>

Received on Monday, 6 July 2020 14:20:20 UTC