Static Analysis of SHACL [Was Re: STRAWPOLL on Approach for SHACL] from Jerven Tjalling Bolleman on 2015-04-01 (public-data-shapes-wg@w3.org from April 2015)

From: Jerven Tjalling Bolleman <jerven.bolleman@isb-sib.ch>
Date: Wed, 01 Apr 2015 10:24:04 +0200
To: public-data-shapes-wg@w3.org
Message-ID: <551BAB24.2080704@isb-sib.ch>
Hi Everyone,

TL;DR; I very strongly agree with Richard!

There is a desire among some to use SPARQL as a defining semantics of 
SHACL lite. Others want to use a different formalism.
I am practically very much in the just use SPARQL camp.
For the following simple reasons.

a) many more people can read
   SELECT (?this AS ?root) (?predicate AS ?path) ?value
   WHERE {
           ?this ?predicate ?value .
           FILTER NOT EXISTS {
                   ?allowedValues sh:member ?value
           }
   }
than
   evalTermSet : P MatchValue → RDFTerm → OptValidity
   ∀ mvs : P MatchValue; n : RDFTerm • evalTermSet mvs n =
   if ∃ mv : mvs •
     ((mv ∈ ran mviri ∧ n ∈ ran iri ∧ (iri · n) = mviri · mv ) ∨
     (mv ∈ ran mviris ∧ n ∈ ran iri ∧
     (iri · n) ∈ IRIstemRange (mviris · mv )) ∨
     (n ∈ ran literal ∧ mvlit · mv = literal · n))
   then p
   else f

This alone is for me a major selling point for going with the SPARQL 
approach. By the way I am not ashamed to admit that I don't really 
understand the Z style, and that if we are complaining SPARQL is 
difficult then this style is extremely difficult. (My logic classes as a 
bioinfomatician by training where extremely limited)


b) For static analysis neither of these matter because you need to look 
at if

ex:AllowedValuesExampleShape
         a sh:Shape ;
         sh:property [
                 sh:predicate ex:someProperty ;
                 sh:allowedValues ex:AllowedValuesExampleSet ;
         ] .

ex:AllowedValuesExampleSet
         a sh:Set ;
         sh:member ex:Value1 .

Is equivalent to

ex:HasValueExampleShape
         a sh:Shape ;
         sh:property [
                 sh:predicate ex:someProperty ;
                 sh:hasValue ex:Value1 ;
         ] .

And whether that is defined in Z-notation, SPARQL or an other formalism 
does not really practically matter in any significant way.

c) SPARQL in its free form presents difficulties for static analysis. 
However, SHACL being defined in SPARQL does not present these 
difficulties at all. SHACL core uses a very limited number of SPARQL 
constructs, constructs that are analyzable, and in practical terms 
replaceable with other formalisms if desired.

d) Static analysis does not depend on formalisms to be able to work. For 
example the excellent findbugs programs for Java. The formalism in the 
java language specification is relatively weak, as is its default type 
system. That does not mean that static analysis is impossible or even 
very hard.

e) If you find a formalism other than SPARQL queries important, well 
even if we use SPARQL, that already exists. e.g. see the algabraic 
syntax for SPARQL 1 described in the "On the Semantics of SPARQL" by 
Arenas et al.

f) If I can do some static analysis on SPIN rules with OWL, then some of 
the smart people on this list can do this on SHACL-lite as well.

Regards,
Jerven

ps. Currently I did not see any construct in SHACL-lite that could not 
be expressed in SPARQL1+recursion just like the current 
SPARQL1.1+recursion so the formalism of Arenas et al still applies.

On 01/04/15 09:58, Richard Cyganiak wrote:
> Static analysis of SPARQL queries may be hard, but not very hard. The inclusion problem over the subset known as “well-designed SPARQL”, which includes most if not all practically occurring queries, is decidable. The semantics of SPARQL is a variation of relational algebra, a well-studied formalism. For SHACL to become widely used, the number of implementers has to exceed the number of academics researching it, and the semantics must be accessible to them. The semantics of SHACL being defined in SPARQL doesn’t prevent the use of other formalisms for analysing it.
>
> Best,
> Richard
>
>
>> On 1 Apr 2015, at 08:19, Iovka Boneva <iovka.boneva@univ-lille1.fr> wrote:
>>
>> As I am against using SPARQL for defining the semantics, I catch the question to Jose to give you my reasons.
>>
>> I consider very important to be able to perform static analysis on shapes. The semantics of SPARQL is by itself complex, and would make static analysis very hard.
>>
>>
>> If SHACL becomes widely used (which I hope will be the case), then lots of people will be interested in performing static analysis. Static analysis is useful at least for query optimization and simplification of shapes, and certainly for many other reasons that will come up with applications. As an example, my colleagues and myself have ideas on how to use shapes for easier data integration, and would need to perform static analysis for that.
>>
>> Typical static analysis problems are testing inclusion and equivalence of shapes, testing whether two shapes are compatible (non disjoint), testing whether a shape is compatible with a query.
>>
>> A SPARQL based semantics would make static analysis very hard. For instance, inclusion of SPARQL queries is undecidable, and in order to have decidable problems, one would need to end up with different kinds of restrictions of SPARQL, which always complicates things.
>>
>> For making static analysis easier, it is very much preferable to adopt a restricted core language with controlled expressive power, and with declarative semantics based on a well studied formalism.
>>
>> Iovka
>>
>>
>> Le lun. 30 mars 2015 22:42:09 CEST, Peter F. Patel-Schneider a écrit :
>>>
>>> -----BEGIN PGP SIGNED MESSAGE-----
>>> Hash: SHA1
>>>
>>> Hi Jose:
>>>
>>> Are you against using SPARQL as a specification formalism for the high-level
>>> language of SHACL even if there is no need to use a SPARQL implementation
>>> (or equivalent) to actually implement the high-level language of SHACL?
>>>
>>> If so, what aspects or features of such a specification are you against?
>>>
>>> peter
>>>
>>>
>>> On 03/26/2015 10:52 AM, Jose Emilio Labra Gayo wrote:
>>>>
>>>> I was going to vote but reading the options, they are both options
>>>> reasonable, what worries me is if there is some hidden implication about
>>>> the relationship between SHACL and SPARQL.
>>>>
>>>> If option a) doesn't imply that the high-level language constructs will
>>>> be merged with the SPARQL definitions, I would not have a problem if
>>>> they are in the same document but in separate sections.
>>>>
>>>> However, if voting option (a) implies that the high-level language will
>>>> be tied to SPARQL as it currently is, the my vote will be against.
>>>>
>>>> Best regards, Jose Labra
>>>>
>>>>
>>>> On Thu, Mar 26, 2015 at 2:36 AM, Arnaud Le Hors <lehors@us.ibm.com
>>>> <mailto:lehors@us.ibm.com>> wrote:
>>>>
>>>> There has been a lot (!) of discussion on the mailing list and I'd like
>>>> to get an update on where the WG stands with regard to the different
>>>> approaches being proposed. I know this doesn't capture all the issues
>>>> (obviously) and some will feel that this isn't the right question but at
>>>> least this is one point of contention that we need to address so,
>>>> please, bear with me.
>>>>
>>>> Rather than doing this just on a teleconference I set up a wiki page so
>>>> that who can't attend the teleconference can still respond:
>>>> https://www.w3.org/2014/data-shapes/wiki/Strawpoll_On_Approach
>>>>
>>>> Thank you. -- Arnaud Le Hors - Senior Technical Staff Member, Open Web
>>>> Technologies - IBM Software Group
>>>>
>>>>
>>>>
>>>>
>>>> -- -- Jose Labra
>>>>
>>>
>>> -----BEGIN PGP SIGNATURE-----
>>> Version: GnuPG v2
>>>
>>> iQEcBAEBAgAGBQJVGbUhAAoJECjN6+QThfjz4YcIAKPfBa9XucZ1sRmxghaHqlq1
>>> 1M0umxM79QXRBjNL3qO1R2QZRGo3tgrrZf0cKTBJOXFSRaEyX5g7nZuy3gGkFHNz
>>> jrjN393t9wNZ4ZVJLPTN+zsETECeSlwho6k2qGheX1DS/biwfeUg9ZSJx2d/Av+v
>>> faT8yQN7dR4I3GDiSl4uL3VZfdkqqzq56mGlzTocPDfiLNMThBaJX6dn/WuWnzMF
>>> XjPITr42kwFeow4Cq0DQ4OgqvoahRA0EtVmI+IoBEVcj/yJgxSzn9qc6UvwV7HPd
>>> EQEe1dHN9eUwUlHdQWynn8PlO942qroeP4Nqa120Vo3Egowr8A94RcK0q3+l7Ig=
>>> =pLs4
>>> -----END PGP SIGNATURE-----
>>>
>>
>>
>>
>
>

-- 
-------------------------------------------------------------------
Jerven Bolleman                        Jerven.Bolleman@isb-sib.ch
SIB Swiss Institute of Bioinformatics  Tel: +41 (0)22 379 58 85
CMU, rue Michel Servet 1               Fax: +41 (0)22 379 58 58
1211 Geneve 4,
Switzerland     www.isb-sib.ch - www.uniprot.org
Follow us at https://twitter.com/#!/uniprot
-------------------------------------------------------------------
Received on Wednesday, 1 April 2015 08:24:30 UTC