Re: fundamental problems with SHACL from Holger Knublauch on 2016-04-11 (public-data-shapes-wg@w3.org from April 2016)

From: Holger Knublauch <holger@topquadrant.com>
Date: Mon, 11 Apr 2016 10:32:55 +1000
To: public-data-shapes-wg@w3.org
Message-ID: <570AF0B7.9020507@topquadrant.com>
On 9/04/2016 0:11, Peter F. Patel-Schneider wrote:
>
> On 04/07/2016 05:20 PM, Holger Knublauch wrote:
>> Hi Peter,
>>
>> I believe you are repeating similar things over and over again. Is there are
>> reason for that, other than reminding the group of some perceived urgency?
>>
>> On 8/04/2016 7:15, Peter F. Patel-Schneider wrote:
>>> So here are some fundamental problems that I currently see in SHACL.
>>>
>>>
>>> The meaning of SHACL is not well defined.  It importantly depends on both
>>> pre-binding and sh:hasShape, both of which have significant problems.
>> None of these are IMHO significant. It's just editorial work that needs to
>> happen, and we all have day jobs in parallel.
> I do not believe that this is editorial.  There are significant issues in how
> both sh:hasShape and pre-binding work that affect the behaviour of SHACL.
>
>>> I had thought that pre-binding was the easy one.  To do pre-binding you
>>> first need to extend SPARQL so that blank nodes can be used in SPARQL
>>> queries, i.e., that if you have access to an RDF graph you can extract
>>> identifiers from that graph and use these identifiers in a SPARQL query just
>>> as if they were IRIs.  Then pre-binding just augments the (outer) SPARQL
>>> query with a VALUES construct that binds variables to values.
>>>
>>> However, apparently this is not the case, as the current document makes
>>> pre-binding out to be something quite different.  I do not have the
>>> expertise to fix all the problems with the treatment of pre-binding in the
>>> current document but I have pointed out a number of problems in it.
>> This is ISSUE-68. I tried various ways of responding to your concerns, but you
>> were not happy with either. And I agree this is work in progress. I would like
>> to be able to finish this once and for all, but always other things pop up in
>> between. You are raising many other ISSUEs including a full-blown counter
>> proposal that would replace basically everything, and at the same time put
>> pressure on me to not do my homework. It shouldn't come as a surprise that I
>> never have time if I am forced to spend my time responding to all your other
>> issues. Meanwhile, nobody else in the group steps up to this task either. The
>> last time I looked into pre-binding a few weeks ago, I was experimenting with
>> the syntax transform package in Jena. I found a bug that had to be fixed
>> first, halting my progress:
>>
>> https://github.com/apache/jena/commit/bc5ace0e9460ae979079532f610a88b6363e96e5
>>
>> I then went on vacation and had plenty of other TopQuadrant work on my plate.
>> I will try to get back to this topic soon.
>>
>> At the same time I still do not understand your problem with the semantics of
>> pre-binding. Simply using VALUES is not going to work, because we need to be
>> able to walk into nested scopes and even nested SELECT queries. I had
>> explained this before. Not sure why you keep repeating the same issue.
> Pre-binding is currently defined in a very complex manner.
>
> There is an initial substitution into SPARQL code.  This substitution changes
> the behaviour of the SPARQL code in many different ways.  First there is the
> change that would occur if the affected variable had a top-level binding.
> However, there are other changes.   Distinct variables with the same name in
> sub-queries are also changed.  This changes the meaning of sub-queries in a
> way different than that of a top-level binding.  Second, the substitution
> makes certain bits of previously-valid syntax invalid, including bindings,
> GRAPH constructs, the bound function, GROUP BY constructs, and ORDER BY
> constructs.  Each of these have to be fixed up by a set of compensating code
> transformations.   There is no certainty that there are not other
> compensations that need to be made to handle invalid syntax caused by the
> substitution.  I can easily think of several - simple variables in select
> clauses, variables in group conditions, variables in bindings, and variables
> in data blocks.  There could easily be others.  There is also no certainty
> that the initial substitution does not change the meaning of SPARQL code.  I
> pointed out above that it does change the meaning of subqueries but there
> could easily be other changes.
>
> Blank nodes then add another complication.  The current document does not give
> an actual method for handling pre-bound blank nodes.  The document suggests
> that using an algebra approach would work and so would a substitution
> approach.  However, there are no details of how to do either and no
> specification of what either should actually do.
>
>
>>> As far as I can tell, sh:hasShape has never had a correct definition in the
>>> document.  It has severe problems relating to recursion, which I pointed
>>> out, and is still described as if arbitrary recursion is part of SHACL.
>> This is ISSUE-131 which I have addressed today. We should continue discussion
>> on that thread:
>>
>> https://lists.w3.org/Archives/Public/public-data-shapes-wg/2016Apr/0026.html
> Yes, discussion is happening concerning sh:hasShape.  As I wrote this message,
> however, sh:hasShape had never had a correct definition, making this a
> fundamental problem in SHACL.
>
>>> There are other recent problems with the meaning of SHACL.  I recently
>>> pointed out one of them having to do with nodes in a shape graph that have
>>> rdf:type links to both sh:PropertyConstraint and
>>> sh:InversePropertyConstraint.
>>>
>>>
>>> The syntax of SHACL is not well defined.
>>>
>>> The current solution to the problems with nodes that belong to  both
>>> sh:PropertyConstraint and sh:InversePropertyConstraint is to make them
>>> illegal syntax.  However, this is quite tricky as SHACL performs several
>>> kinds of inference on shapes graphs.  Several partial fixes for determining
>>> whether a node is a legal value for sh:Property, sh:InverseProperty, or
>>> sh:Constraint have been proposed, but all of them have been incomplete and
>>> not well founded.
>> This is ISSUE-134. Again, we already have several threads open for that topic
>> and I will get to this in due course. I don't find it helpful to have yet
>> another email thread with yet more of the same here.
> This is again an illustration of a fundamental problem with SHACL, and thus
> suitable for inclusion in this email message which arose from a comment on
> yesterday's teleconference.

All topics above have now been continued in separate threads.

>
>> Overall all this just serves to give the impression that there are countless
>> problems, while on closer examination each individual issue is quite solvable.
> I disagree here.  My view is that instead the design of SHACL is becoming more
> and more complex.  Problems are patched, the patch is patched, and so on.
> Each patch adds complexity and introduces more places for things to go wrong,
> and they do.

Let's agree to disagree then. I think it's a normal process in which we 
have a starting point, find bugs, fix the bugs and do the next 
iteration. We are exploring this space of Shape languages together - 
nobody has ever done this as a W3C standard. Of course not everything is 
perfect, and of course we change our opinions and viewpoints as we go.

I am making mistakes - particularly many mistakes because my editing 
work is visible to everyone. You are making mistakes. Other people have 
made mistakes (or remain silent). There is nothing unusual of alarming 
here at all. I appreciate your efforts in reading and re-reading the 
spec, and reporting the issues that you find. That's the right way to 
make progress - not throwing up one's hands or rubbishing everything.

You may have noticed that there has been a recent surge in (early) 
implementations of SHACL

https://twitter.com/HolgerKnublauch/status/718362533405540352
https://twitter.com/konigdev/status/718852278048919553
https://www.jiscmail.ac.uk/cgi-bin/webadmin?A2=ind1604&L=DC-ARCHITECTURE&F=&S=&P=58

This happens despite the allegedly broken spec. There is no reason to 
"restart" - we would even lose credibility if we did.

>
>>> None of these fixes have attacked the underlying problem which is that the
>>> syntactic category of a constraint node is partly based on rdf:type links of
>>> that node and partly based on how that node fits into a shape.  This split
>>> in syntactic determination makes for a complex, error-prone, and hard to
>>> understand syntax.
>>>
>>>
>>> There are other problems with the syntax that may not be individually
>>> fundamental, but together are quite significant.
>>>
>>> Lists are used in various places in the syntax.  Several constraint
>>> components have lists as values of their main property.  However, there is
>>> no definition in the document as to what make a valid list, or even any
>>> definition of what constitute the members of a list.
>> Hmmm, isn't it clear that we are talking about rdf:Lists, and then of course
>> the usual rdf:List syntax from the existing specs will be used. Why do we need
>> to repeat any of this in the SHACL spec? It would be like explaining the
>> meaning of the various XSD datatypes...
>  From where?  I don't know of any suitable document.

Many other W3C specs have happily used rdf:Lists, including OWL. Did 
they all repeat the syntax rules of well-formed rdf:Lists?

Holger


>
>>> The syntax has several unnecessary restrictions.  It is not possible to
>>> repeat properties in constraints (but it is almost necessary to repeat
>>> properties in shapes).
>> This is ISSUE-133 for which we seem to be very close to a resolution (see
>> PROPOSALS page), allowing repeated properties. With more time, we could have
>> closed that issue today.
>>
>>> Constraints and shapes are different, leading to
>>> verbose syntax, even for an RDF encoding.
>> This is (mostly) ISSUE-135. Merging shapes and constraints introduces new
>> problems and throws things together that do not really belong together.
> ShEx seems to get along fine without having this distinction, so I don't see
> why these two things do not really belong together.  OWL also does not make
> this distinction.
>
>> I assume you want to use all this (and similar) emails to make a case for your
>> Proposal 4. I have enumerated several serious problems with that proposal, but
>> you have not responded to them. Do you seriously believe that once we switch
>> to your proposal then suddenly all issues will go away, and we will not
>> discover many new problems? The current syntax has been around for quite a
>> while now and many people around the world have worked with it. I personally
>> have in-depth experience with this approach now and like it a lot. I don't see
>> "fundamental" problems other than that we are progressing too slowly.
>>
>> Holger
> peter
>
Received on Monday, 11 April 2016 00:33:32 UTC