Re: ISSUE-95: Template Simplifications from Holger Knublauch on 2015-10-29 (public-data-shapes-wg@w3.org from October 2015)

From: Holger Knublauch <holger@topquadrant.com>
Date: Thu, 29 Oct 2015 15:29:31 +1000
To: public-data-shapes-wg@w3.org
Message-ID: <5631AEBB.2020104@topquadrant.com>
On 10/29/2015 14:14, Peter F. Patel-Schneider wrote:
> Here is my proposal for a better simplification, based on changes to Part 2 of
> http://w3c.github.io/data-shapes/shacl/index-ISSUE-95.html.   The basic idea
> is to
> 1/ get rid of template injection, instead using supertemplates, which I think
> can do everything needed;
> 2/ get rid of abstract classes;
> 3/ get rid of validation functions, as they are not needed; and
> 4/ get rid of functions, as there is no way to call them.
>
>
>
> Part 2 of the SHCAL spec is very hard to read.  It has quite a few undefined
> terms, and depends on several very difficult-to-do operations.  It is
> insufficiently specific in many places.

I agree this requires more editorial work (and thanks for reading the 
details nevertheless!). I can certainly elaborate on this whole part, 
but was waiting to have a final design first.

>
> I here propose several changes the normative bits of the version of Part 2
> that was prepared for ISSUE-95,
> http://w3c.github.io/data-shapes/shacl/index-ISSUE-95.html, that fix many of
> the problems there.  I have not proposed changes for any of the bits marked
> non-normative or any of the examples.
>
>
> 6.2
>
> The SPARQL queries linked to a constraint via sh:sparql must be string
> literals that can be parsed into legal SPARQL 1.1 queries of the query form
> SELECT.
> ->
> The values for sh:sparql must be either the empty string literal ("") or
> string literals that are can be parsed into legal SPARQL 1.1 queries of the
> query form SELECT.  The empty string literal indicates a vacuous constraint,
> i.e., one that never produces any violations.

Could you clarify why the "" literals are needed? Why not use no value 
instead?

>
> SHACL also includes a more general superclass sh:Template that may be used
> for other kinds of templates (rules, stored queries etc). Well-defined,
> non-abstract templates must provide at least one body using a property such
> as sh:sparql.
> ->
> Well-defined templates must provide at least one body using a property such
> as sh:sparql.

Why drop sh:Template? It is already used as shared superclass of 
sh:ConstraintTemplate and sh:ScopeTemplate, plus we have various other 
types of templates in production and having a shared superclass 
streamlines the infrastructure to manage them.

>
> 7.4
>
> [All of 7.4]
> ->
> It is sometimes desirable to mix multiple templates so that
> they can be used within the same constraint.  This is done by making a
> template class that is a subclass of multiple other template classes.  An
> instance of the child template class then combines the effects of all these
> templates, because it is an instance of them all.

You state above that you want to get rid of abstract superclasses. In 
the current TTL file, there is a class sh:AbstractPropertyConstraint 
which defines the argument sh:predicate once and for all its subclasses. 
Similarly, there is now a shared superclass for the two templates 
defining sh:qualifiedMinCount and sh:qualifiedMaxCount, to define their 
shared argument sh:qualifiedValueShape. How would your design handle 
these cases?

Also, why would anyone want to instantiate something like 
sh:AbstractDatatypePropertyConstraint directly? The language only really 
supports instantiating sh:PropertyConstraint.

The reason for me to introduce template injection was to be able to 
distinguish the "inheritance" of arguments from "merging" them into a 
single node. My previous design was mixing those two aspects, blurring 
the lines between those two use cases.

While not completely out of the question, a major problem that I ran 
into was the treatment of optional arguments. Example of this include 
sh:ignoredProperties (from closed shapes) and sh:flags (from 
sh:pattern). If we only have a single mechanism, then a superclass such 
as sh:PropertyConstraint would "inherit" all properties such as 
sh:minCount and sh:pattern as non-optional. Do you have a better 
solution to this?

>
> 7.6
>
> If a sh:PropertyValueConstraintTemplate has a value for
> sh:validationFunction, ... [to end of section]
> ->
> [empty]

My mistake: The branch mentions sh:validationFunction, but that is a 
left-over that needs to be replaced. It should check for instance-of 
sh:NodeValidationFunction instead.

However, you seem to want to delete the whole mechanism of using 
functions here. The problem that this design was supposed to address is 
that we otherwise need to introduce many duplicate templates, e.g. to 
share the functionality of sh:class between sh:property and 
sh:inverseProperty. In some cases, we would need to define four 
templates, while only a single function would be needed. Having worked 
with the former approach for too long (and countless copy-and-pastes 
later), I really want to move to functions, and I am confident that 
other users of advanced SHACL will share this sentiment. The functions 
have the additional benefit that they only need to be of the ASK format, 
reducing the boilerplate of the SELECT clauses.

>
>
> 8
>
> All this is analogous to how constraints work, but with
> the additional restrictions:
> * All subjects of sh:scope triples must be IRIs
> * The arguments of a scope template must not be blank nodes
> ->
> All this is analogous to how constraints work.

I had introduced those two clauses for a reason, when I implemented the 
SPARQL code generation. This is complicating. If the subjects of 
sh:scope triples are blank nodes, then it becomes impossible to generate 
SPARQL code that "points" at the scope declaration. As far as I 
remember, the problem was that each scope essentially becomes a nested 
SELECT DISTINCT clause. Due to the inside-out-evaluation policy of 
SPARQL, it is becomes impossible to pass pre-bound variables into such 
clauses, especially not blank nodes (see second bullet item above). So 
my work-around was to rely on property functions (magic properties) that 
I defined for Jena to produce the bindings, passing in the scope shape 
as a URI.

Do you have examples where scope template arguments must be blank nodes? 
Do you have arguments for blank nodes as subjects of sh:scope? Although 
I could understand why conceptually such things should not matter, I 
believe allowing either will vastly complicate the implementation of 
this feature. It would be good to have a second implementation look into 
this problem space to confirm or reject the problems that I have 
encountered. If someone has a better solution then I am happy to change 
my view point. Meanwhile, I'd suggest to stay conservative with an 
approach that is better under control.

>
> 8.1
>
> The SPARQL queries linked to a scope via sh:sparql must be of the query form
> SELECT, or a fragment that produces a valid SELECT query if wrapped by
> SELECT ?this WHERE { ... }. The SELECT queries must project to the result
> variable ?this.
> The SELECT queries must also be executable when converted to an ASK query
> and with a pre-bound value for ?this. The set of bindings for ?this that
> return true for such ASK queries must be identical to the set produced by
> the SELECT query. This constraint makes sure that engines can validate
> whether a given shape applies to a given focus node as part of the
> validateNode operation.
> ->
> The SPARQL queries linked to a scope via sh:sparql must be of the query form
> SELECT ?this WHERE { ... }.

Ok, I could live without allowing the fragments, for simplification 
purposes.

The reason for the second paragraph (on the pre-bound variable for 
?this) is the validation of individual nodes. For example, when someone 
has a shape with a custom scope and you have ex:MyInstance, then the 
algorithm to determine whether the shape applies to the instance can be 
much more efficient than having to evaluate the whole scope and check 
whether the result set contains ex:MyInstance. The latter would become 
prohibitively slow for large databases.

Do you have examples of scopes where that restriction would be an 
obstacle? The (few) examples of custom scopes that I have seen were 
easily convertible into ASK queries without changing the WHERE clause.

An alternative design to dropping this bidirectionalism would be to have 
an optional second property sh:inverseSPARQL that can be put to a scope 
for those cases where the original scope query cannot be converted to 
ASK. I would be OK with that.

>
>
> 9
>
> [Remove entirely, as there is no defined way to call functions.]

This is neither true nor helpful. SHACL functions can be called from 
every SPARQL query (e.g. constraint or scope). Regardless of whether we 
keep sh:NodeValidationFunctions, the general mechanism has proven to be 
extremely successful in SPIN, leading to vastly more compact and better 
maintainable SPARQL queries. The fact that sh:NodeValidationFunctions 
are also normal SPARQL functions means that the business logic can be 
reused in multiple places. There are approved requirements for 
functions, even "concise language" falls into that category.

>
> Well-defined, non-abstract functions must provide at least one
> body property such as sh:sparql.
> ->
> Well-defined functions must provide at least one
> body property such as sh:sparql.
>
> 9.4
>
> [Remove, if implementations want to analyze functions to see if they are
> chachable they are free to do so.]

I could certainly live with making this a TopBraid-only feature and I 
will not fight for it. We did however have some use cases where this 
analysis is not easily possible. Examples are queries against read-only 
graphs with background data. How would an engine determine that all ?x 
in GRAPH ?x { ... } are read-only graphs?

>
> 9.5
>
> [Remove, as implementing this will require parsing and modifying SPARQL
> bodies.]

What modifications are required? I had explained how these are invoked 
in 7.6

Thanks,
Holger
Received on Thursday, 29 October 2015 05:30:10 UTC