Re: SHACL target extension

Hi Holger! Thanks for the comments!

introduce a new property instead of
> sh:target, because the meaning of sh:target would otherwise be
> overloaded and it is possible for targets to also be sh:NodeShapes


SHACL-AF says "The algorithm that is used for this computation depends on
the rdf:type of the custom target (sh:target)",
and then specifies two such types (sh:SPARQLTarget and sh:SPARQLTargetType).
My proposal is to use exactly sh:NodeShape as rdf:type, because we've
described targeting by node shape.
I don't see why it's confusing to use the same sh:NodeShape for both
targeting and its normal purpose (validation),
and it's important for us to be able to reuse shapes in this way (see the
last 2 examples).

IMHO it should be something like sh:targetShape
>

I'd be fine with this (as soon as we stick with type sh:NodeShape) but
don't see why it's needed:
- my proposal: sh:target [a sh:NodeShape; ...]
- your proposal: sh:targetShape [a sh:NodeShape; ...]

sh:target is polymorphic by SHACL-AF definition, so I don't see why we need
a specialized prop name.

I remain very nervous about performance implications.


That was also my concern because we're paying Havard to implement what we
need for the Onto platform,
which is a limited targeting (conjunction of disjunction of hasValue).
But Havard assures us that he's already implemented more generic targeting
(though still not full SHACL shapes! there's only atomic sh:path)
and that it's efficient.

Havard has answered with a lot more detail about performance.

I'll add some warning that such targeting is potentially expensive, and
users must be careful when using it, and check with their specific SHACL
implementation.


> "is node N in the target of S" requires iterating over all
> sh:targetShapes each time. This can be very expensive.
>

Yes, that's also a concern and we'll give Havard sizable schemas (say 100
shapes, and each node matches say 5-10 shapes, being the depth of the
'semantic type hierarchy").

The implementation cost of this feature is significant, because it
> requires the implementation of an "inverse validation" algorithm.
> Validation starts with a focus node and returns a result.


In rdf4j, validation starts with a transaction, assuming that data-at-rest
is valid.
I believe Havard can "index" all the targeting shapes, so it's efficient to
check all of them over the set of nodes in the transaction.

guess most of them are hard to execute in the inverse order:
> sh:datatype, sh:nodeKind, sh:minExclusive etc, sh:minLength etc,
> sh:pattern, sh:languageIn, sh:uniqueLang, sh:lessThan etc, sh:closed,


You're right in many cases.
Any user who selects nodes by strlen is shooting himself in the foot.
So we better put in some warnings which constructs it's wise to use in a
target shape, and which ones are stupid.


> So what if we simply introduce a new target type sh:targetHasValue V
> where the targets can be identified by a direct look-up. For example
>
> ex:KiwiShape
>      sh:targetHasValue [
>          sh:path ex:nationality ;
>          sh:hasValue ex:NewZealand ;
>

We need somewhat more though:

ex:PoliticianShape a sh:NodeShape;
  sh:semanticTarget (
    [sh:path rdf:type; valueIn (dbo:Person schema:Person)]
    [sh:path dt:type; valueIn ("politician" "president")]
  );

That's what I started with, but then you guys said "filter shapes are very
useful", so I wrote up the more general case.

Received on Friday, 5 June 2020 16:18:54 UTC