Re: ISSUE-68 and ISSUE-131: sh:hasShape and pre-binding from Dimitris Kontokostas on 2016-06-11 (public-data-shapes-wg@w3.org from June 2016)

From: Dimitris Kontokostas <kontokostas@informatik.uni-leipzig.de>
Date: Sun, 12 Jun 2016 00:33:18 +0300
To: "Peter F. Patel-Schneider" <pfpschneider@gmail.com>
Cc: RDF Data Shapes Working Group <public-data-shapes-wg@w3.org>
Message-ID: <CA+u4+a0k4PEy-qzPk9tyKenC7ibTvCMQFS85WSK4wmvtqOX+iw@mail.gmail.com>
On Fri, Jun 10, 2016 at 4:21 PM, Peter F. Patel-Schneider <
pfpschneider@gmail.com> wrote:

> Pre-binding and sh:hasShape form a large part of the meaning of SHACL.
> They
> are not just part of the extension mechanism in SHACL but are used in the
> definition of the core of SHACL.
>
> In Section 1.5 there is
>
>   This specification uses parts of SPARQL 1.1 in the normative definition
> of
>   the semantics of the SHACL Core constraints and scopes.
>
>   SPARQL variables using $ marker represent external values that must be
>   pre-bound in the SPARQL query before execution.
>
>   Some SHACL constraints are defined with the use of the sh:hasShape
>   function.
>
> In Section 4 there is
>
>   The SPARQL definitions in this section also assume the existence of a
>   built-in SPARQL function sh:hasShape.
>
> Then pre-binding shows up in the normative definition of every core
> constraint component and sh:hasShape shows up the normative definitions of
> sh:not, sh:and, sh:or, sh:shape, and sh:qualifiedValueShape.
> It is possible to implement the core of SHACL without using sh:hasShape and
> pre-binding but this implementation will be implementing something that is
> defined in large part by sh:hasShape and pre-binding.
>

Hi Peter,

Could these core components be defined in the spec only with prose or would
we need another way (like a query generation algorithm)?



> In the extension part of SHACL, sh:hasShape and pre-binding are used
> directly when writing the SPARQL code that implement templates.  Problems
> with sh:hasShape and pre-binding thus are not just problems with an
> underlying definition of SHACL but also directly affect the meaning of
> constructs that are employed by users of SHACL.
>
> It is possible to have a SPARQL-based extension mechanism for SHACL that
> does not use sh:hasShape and does not use pre-binding.  Thus neither
> sh:hasShape nor pre-binding is needed for SHACL.
>
>
> sh:hasShape is currently defined in Appendix A of the SHACL specification,
> http://w3c.github.io/data-shapes/shacl/#hasShape.  sh:hasShape currently
> produces three results: undefined recursion is encountered, true if no
> violation validation result is produced, and false if some violation result
> is produced.
>
> This desription of sh:hasShape has several problems.  First, it is unclear
> as to which validation results count in the description.  Is it only result
> from the direct validation of the focus node or do results from embedded
> shapes count?  Second, the three possibilities are not disjoint.  Third,
> recursion is not possible in SHACL so the undefined result can never occur.
>
> However, the biggest problem with sh:hasShape is that it depends on
> pre-binding.  sh:hasShape has to evaluate SPARQL queries in a context where
> several query variables are limited to certain values.  This is an innate
> peculiarity of using a SPARQL function that in turn initiates further
> SPARQL
> query processing so problems in pre-binding are problems for sh:hasShape.
>
>
> Pre-binding of variables in SHACL is currently defined in Appenix B of the
> SHACL specification, http://w3c.github.io/data-shapes/shacl/#pre-binding.
>
> Pre-binding is defined, in full, as
>
>   Pre-binding a variable with a value means that the SPARQL processor needs
>   to evaluate all occurrences of variables with that same name (including
>   occurrences in inner scopes and nested SELECT queries) so that they have
>   the provided value. In other words, whenever a SPARQL processor evaluates
>   a pre-bound variable, it must use the given value.
>
> This definition does not align with the definition of SPARQL at all.
> SPARQL
> is a query language and often does not evaluate query variables.  In
> particular, SPARQL does not evaluate query variables in basic graph
> patterns.  The definition of basic graph pattern matching in SPARQL, from
> https://www.w3.org/TR/sparql11-query/#BasicGraphPattern, is
>
>   Let BGP be a basic graph pattern and let G be an RDF graph.
>   μ is a solution for BGP from G when there is a pattern instance mapping P
>   such that P(BGP) is a subgraph of G and μ is the restriction of P to the
>   query variables in BGP.
>
> Note that there is no notion of evaluation here at all.  Using evaluation
> as
> the basis of the definition of pre-binding is thus disconnected from a
> large
> part of the behaviour of SPARQL.
>
> This disconnect shows up in even the simplest of SPARQL queries that
> implement constraint components.  Consider the normative SPARQL definition
> of sh:class in property constraints
>
>   SELECT $this ($this AS ?subject) $predicate (?value AS ?object)
>   WHERE {
>         $this $predicate ?value .
>         FILTER NOT EXISTS { ?value rdf:type/rdfs:subClassOf* $class } .
>   }
>
> The pre-binding of $this and $predicate does not affect meaning of the
> basic
> graph pattern
>
>         $this $predicate ?value .
>
> so that, according to the definitionf of SHACL and SPARQL, the solution
> sequence generated from matching this basic graph pattern will have
> solutions for each triple in the data graph.  This is already a total
> failure but what happens next?  Well the filter is used to remove some of
> the solutions, using the SPARQL semantic Filter function.  Each solution is
> checked to see whether the filter evaluates to true for that solution.
> Because the filter expression is an EXISTS expression it uses the SPARQL
> substitute function, which for each query variable in ?value
> rdf:type/rdfs:subClassOf* $class replacces it by its mapping in the
> solution, if any.  There is a solution for each triple in the data graph
> this will result in that many substitutions.  Next each of these
> substitutions is separately matched against the data graph.  This matching
> will have a result for values of $this that are the subject of an rdf:type
> triple and then these solutions are filtered out.  So the end result will
> have a solution for every triple in the data graph where the subject of the
> triple is not the subject of an rdf:type triple.
>
> Of course this is completely not what the result should be.  However, it is
> what the current definition of SHACL says the result is.
>
>
> Some SPARQL expert is going to have to take a close look at pre-binding to
> determine what its definition should be.  However, before that there needs
> to be a closer look taken at how pre-binding should operate.  For example,
> should prebinding affect variables throughout the query or only variables
> that would be affected by a BIND construct at the beginning of the query?
> There should be some examples generated to show how pre-binding works under
> these two options so that the working group can make an informed decision.
>
>
> Peter F. Patel-Schneider
> Nuance Communications
>
>
>


-- 
Dimitris Kontokostas
Department of Computer Science, University of Leipzig & DBpedia Association
Projects: http://dbpedia.org, http://rdfunit.aksw.org,
http://aligned-project.eu
Homepage: http://aksw.org/DimitrisKontokostas
Research Group: AKSW/KILT http://aksw.org/Groups/KILT
Received on Saturday, 11 June 2016 21:34:13 UTC