Re: read-through of Sections 5 and 7 (response to section 7) from Holger Knublauch on 2016-11-28 (public-rdf-shapes@w3.org from November 2016)

From: Holger Knublauch <holger@topquadrant.com>
Date: Mon, 28 Nov 2016 14:41:53 +1000
To: public-rdf-shapes@w3.org
Message-ID: <7bd1ca73-c923-528d-7c41-0f74ac511edb@topquadrant.com>
On 27/11/2016 0:56, Peter F. Patel-Schneider wrote:
> Section 7 SPARQL-based Targets
>
> "All subjects of sh:target triples must be IRIs."  This makes absolutely no
> sense.  Why should the shapes that have SPARQL-based targets be IRIs.

I do not remember the history of this, neither does it seem to make a 
difference in my implementation. So I have removed this restriction for 
now. If it comes back to haunt any implementors in the future, we can 
bring this restriction back.

>
> "The SPARQL queries linked to a target via sh:select must be of the query
> form SELECT."   This doesn't apply the prefix handling rules.

Added: (after applying the <a href="#sparql-prefixes">prefix handling 
rules</a>)

>
> "The SELECT queries must project to the result variable this."  There is no
> definition for "project to".

Switched to "must return...".

>
> "The resulting target consists of all distinct bindings for the variable
> this."  The target is the value of sh:target so how can it be the bindings?

Switched to "The resulting <a>focus nodes</a> are the distinct bindings 
for the variable <code>this</code>."

> "The SELECT queries must also be executable when converted to an ASK query
> and with a pre-bound value for ?this."    There is no definition for
> converting a SELECT query to an ASK query.  There is no notion of a query
> being executable.

See below:

>
> "The set of bindings for ?this that return true for such ASK queries must be
> identical to the set produced by the SELECT query. This design makes sure
> that SHACL Full processors can validate whether a given shape applies to a
> given individual focus node."   A SHACL Full processor can always just run
> the SELECT query and check whether the individual focus node is in the
> result set.   So this condition, which is difficult and maybe even possible
> to check, is unnecessary.  The checking can even be done completely within
> SPARQL by appending a values clause to the query, which can then be
> optimized by the SPARQL processor, so there is not even any particular
> reason to have this condition for efficiency purposes.

You suggest to drop the restriction that the SELECT queries of targets 
are convertible into equivalent ASK queries. I disagree that this is the 
right way forward.

It is important to be able to validate individual nodes only, not just 
the complete graph. Too do so, there needs to be a facility to identify 
all relevant shapes based on their target. So the question is not: "here 
is a shape, what nodes does it target?" but instead "here is a node, 
which shapes do target the node?". These are entirely different 
questions from a performance perspective. The alternative to first run 
the SELECT and then test whether the potential focus node is among the 
results is not practical - a target SELECT may easily return millions of 
nodes. With an ASK query, the answer often becomes clear with a trivial 
BGP in O(1). I do not understand your comment that a VALUES clause could 
be appended - how would that work: to populate a VALUES clause you would 
already need the results of the query, which is again too slow in practice.

So instead of dropping this restriction and pushing the cost to all end 
users, I'd rather work on getting the terminology clear and precise 
enough. I have rewritten the corresponding paragraph (as shown in the 
diff) as an attempt to address this. (The test whether this particular 
invariant is true is intentionally not required, i.e. a MAY, as it may 
indeed be tricky to specify for all scenarios. I guess it could be done 
by checking that the query contains a BGP that binds ?this instead of 
BIND, but writing this down formally is probably an overkill).

>
> "Similar to constraint components, such targets take parameters that are
> interpreted when the target is evaluated."  There is no notion of evaluation
> of targets.

Dropped the part after "that are...".

>
> "All parameters of target types are expected to have sh:maxCount 1."  What
> does expectation mean in the context of parameters?

Switched to "All parameters of target types must have at most one value 
for the same subject."

>
> SPARQL-based target types appear to have many of the same characteristics of
> sh:SPARQLTarget targets.  Why then do the underlying queries not have the same
> restrictions?

Yes that was the intention, and I have added a corresponding cross-link.

> Why are the queries not restricted to those whose top-level
> SELECT includes (or has only) this in the variables of its select clause?

Yes that was the intention, and I have clarified that ?this is the only 
permitted result variable.

> Why are queries not restricted to ones that can be converted to ASK queries
> that have the same behaviour?

See above, this is now handled via the cross-link.

Diff: 
https://github.com/w3c/data-shapes/commit/9e3c9c410558d2e46be956f90106929cf35d644d

Thanks for your comments!
Holger
Received on Monday, 28 November 2016 04:42:31 UTC