Re: fundamental problems with SHACL from Peter F. Patel-Schneider on 2016-04-08 (public-data-shapes-wg@w3.org from April 2016)

From: Peter F. Patel-Schneider <pfpschneider@gmail.com>
Date: Fri, 8 Apr 2016 07:11:28 -0700
To: Holger Knublauch <holger@topquadrant.com>, public-data-shapes-wg@w3.org
Message-ID: <5707BC10.7010908@gmail.com>
On 04/07/2016 05:20 PM, Holger Knublauch wrote:
> Hi Peter,
> 
> I believe you are repeating similar things over and over again. Is there are
> reason for that, other than reminding the group of some perceived urgency?
> 
> On 8/04/2016 7:15, Peter F. Patel-Schneider wrote:
>> So here are some fundamental problems that I currently see in SHACL.
>>
>>
>> The meaning of SHACL is not well defined.  It importantly depends on both
>> pre-binding and sh:hasShape, both of which have significant problems.
> 
> None of these are IMHO significant. It's just editorial work that needs to
> happen, and we all have day jobs in parallel.

I do not believe that this is editorial.  There are significant issues in how
both sh:hasShape and pre-binding work that affect the behaviour of SHACL.

>> I had thought that pre-binding was the easy one.  To do pre-binding you
>> first need to extend SPARQL so that blank nodes can be used in SPARQL
>> queries, i.e., that if you have access to an RDF graph you can extract
>> identifiers from that graph and use these identifiers in a SPARQL query just
>> as if they were IRIs.  Then pre-binding just augments the (outer) SPARQL
>> query with a VALUES construct that binds variables to values.
>>
>> However, apparently this is not the case, as the current document makes
>> pre-binding out to be something quite different.  I do not have the
>> expertise to fix all the problems with the treatment of pre-binding in the
>> current document but I have pointed out a number of problems in it.
> 
> This is ISSUE-68. I tried various ways of responding to your concerns, but you
> were not happy with either. And I agree this is work in progress. I would like
> to be able to finish this once and for all, but always other things pop up in
> between. You are raising many other ISSUEs including a full-blown counter
> proposal that would replace basically everything, and at the same time put
> pressure on me to not do my homework. It shouldn't come as a surprise that I
> never have time if I am forced to spend my time responding to all your other
> issues. Meanwhile, nobody else in the group steps up to this task either. The
> last time I looked into pre-binding a few weeks ago, I was experimenting with
> the syntax transform package in Jena. I found a bug that had to be fixed
> first, halting my progress:
> 
> https://github.com/apache/jena/commit/bc5ace0e9460ae979079532f610a88b6363e96e5
> 
> I then went on vacation and had plenty of other TopQuadrant work on my plate.
> I will try to get back to this topic soon.
> 
> At the same time I still do not understand your problem with the semantics of
> pre-binding. Simply using VALUES is not going to work, because we need to be
> able to walk into nested scopes and even nested SELECT queries. I had
> explained this before. Not sure why you keep repeating the same issue.

Pre-binding is currently defined in a very complex manner.

There is an initial substitution into SPARQL code.  This substitution changes
the behaviour of the SPARQL code in many different ways.  First there is the
change that would occur if the affected variable had a top-level binding.
However, there are other changes.   Distinct variables with the same name in
sub-queries are also changed.  This changes the meaning of sub-queries in a
way different than that of a top-level binding.  Second, the substitution
makes certain bits of previously-valid syntax invalid, including bindings,
GRAPH constructs, the bound function, GROUP BY constructs, and ORDER BY
constructs.  Each of these have to be fixed up by a set of compensating code
transformations.   There is no certainty that there are not other
compensations that need to be made to handle invalid syntax caused by the
substitution.  I can easily think of several - simple variables in select
clauses, variables in group conditions, variables in bindings, and variables
in data blocks.  There could easily be others.  There is also no certainty
that the initial substitution does not change the meaning of SPARQL code.  I
pointed out above that it does change the meaning of subqueries but there
could easily be other changes.

Blank nodes then add another complication.  The current document does not give
an actual method for handling pre-bound blank nodes.  The document suggests
that using an algebra approach would work and so would a substitution
approach.  However, there are no details of how to do either and no
specification of what either should actually do.


>> As far as I can tell, sh:hasShape has never had a correct definition in the
>> document.  It has severe problems relating to recursion, which I pointed
>> out, and is still described as if arbitrary recursion is part of SHACL.
> 
> This is ISSUE-131 which I have addressed today. We should continue discussion
> on that thread:
> 
> https://lists.w3.org/Archives/Public/public-data-shapes-wg/2016Apr/0026.html

Yes, discussion is happening concerning sh:hasShape.  As I wrote this message,
however, sh:hasShape had never had a correct definition, making this a
fundamental problem in SHACL.

>> There are other recent problems with the meaning of SHACL.  I recently
>> pointed out one of them having to do with nodes in a shape graph that have
>> rdf:type links to both sh:PropertyConstraint and
>> sh:InversePropertyConstraint.
>>
>>
>> The syntax of SHACL is not well defined.
>>
>> The current solution to the problems with nodes that belong to  both
>> sh:PropertyConstraint and sh:InversePropertyConstraint is to make them
>> illegal syntax.  However, this is quite tricky as SHACL performs several
>> kinds of inference on shapes graphs.  Several partial fixes for determining
>> whether a node is a legal value for sh:Property, sh:InverseProperty, or
>> sh:Constraint have been proposed, but all of them have been incomplete and
>> not well founded.
> 
> This is ISSUE-134. Again, we already have several threads open for that topic
> and I will get to this in due course. I don't find it helpful to have yet
> another email thread with yet more of the same here.

This is again an illustration of a fundamental problem with SHACL, and thus
suitable for inclusion in this email message which arose from a comment on
yesterday's teleconference.

> Overall all this just serves to give the impression that there are countless
> problems, while on closer examination each individual issue is quite solvable.

I disagree here.  My view is that instead the design of SHACL is becoming more
and more complex.  Problems are patched, the patch is patched, and so on.
Each patch adds complexity and introduces more places for things to go wrong,
and they do.

>> None of these fixes have attacked the underlying problem which is that the
>> syntactic category of a constraint node is partly based on rdf:type links of
>> that node and partly based on how that node fits into a shape.  This split
>> in syntactic determination makes for a complex, error-prone, and hard to
>> understand syntax.
>>
>>
>> There are other problems with the syntax that may not be individually
>> fundamental, but together are quite significant.
>>
>> Lists are used in various places in the syntax.  Several constraint
>> components have lists as values of their main property.  However, there is
>> no definition in the document as to what make a valid list, or even any
>> definition of what constitute the members of a list.
> 
> Hmmm, isn't it clear that we are talking about rdf:Lists, and then of course
> the usual rdf:List syntax from the existing specs will be used. Why do we need
> to repeat any of this in the SHACL spec? It would be like explaining the
> meaning of the various XSD datatypes...

>From where?  I don't know of any suitable document.

>> The syntax has several unnecessary restrictions.  It is not possible to
>> repeat properties in constraints (but it is almost necessary to repeat
>> properties in shapes).
> 
> This is ISSUE-133 for which we seem to be very close to a resolution (see
> PROPOSALS page), allowing repeated properties. With more time, we could have
> closed that issue today.
> 
>> Constraints and shapes are different, leading to
>> verbose syntax, even for an RDF encoding.
> 
> This is (mostly) ISSUE-135. Merging shapes and constraints introduces new
> problems and throws things together that do not really belong together.

ShEx seems to get along fine without having this distinction, so I don't see
why these two things do not really belong together.  OWL also does not make
this distinction.

> I assume you want to use all this (and similar) emails to make a case for your
> Proposal 4. I have enumerated several serious problems with that proposal, but
> you have not responded to them. Do you seriously believe that once we switch
> to your proposal then suddenly all issues will go away, and we will not
> discover many new problems? The current syntax has been around for quite a
> while now and many people around the world have worked with it. I personally
> have in-depth experience with this approach now and like it a lot. I don't see
> "fundamental" problems other than that we are progressing too slowly.
> 
> Holger

peter
Received on Friday, 8 April 2016 14:11:58 UTC