Formal objection on syntax checking in SHACL from Peter F. Patel-Schneider on 2017-02-23 (public-rdf-shapes@w3.org from February 2017)

From: Peter F. Patel-Schneider <pfpschneider@gmail.com>
Date: Wed, 22 Feb 2017 20:02:18 -0800
To: public-rdf-shapes@w3.org
Message-ID: <eb0c28cb-9802-914d-2267-286c4e733087@gmail.com>
This is a formal objection to the decision of the RDF Data Shapes working
group to close ISSUE-233 without requiring that all SHACL implementations
check for correct SHACL syntax.


SHACL implementations need to provide an interface that signals whether the
shapes graph argument to validation conforms to the syntactic requirements
of SHACL Core, for SHACL Core implementations, or SHACL-SPARQL, for
SHACL-SPARQL implementations.

For example, it is currently possible for SHACL implementations to implement
the following shape as requiring that all SHACL instances of ex:C1 be any of
ex:i1, ex:i2, ex:i3; or as requiring that all SHACL instances of ex:C1 be
either ex:i1 or ex:i2; or by signalling a syntax error; or indeed by any
behaviour whatever.

se:s1 rdf:type sh:NodeShape ;
  sh:targetClass ex:C1 ;
  sh:in _:b1 .
_:b1 rdf:first ex:i1 .
_:b1 rdf:rest _:b2 .
_:b2 rdf:first ex:i2 .
_:b1 rdf:rest _:b3 .
_:b3 rdf:first ex:i3 .
_:b3 rdf:rest rdf:nil .

The first of these permissable behaviours is the same as the required
behaviour on the following shapes graph

se:s1 rdf:type sh:NodeShape ;
  sh:targetClass ex:C1 ;
  sh:in _:b1 .
_:b1 rdf:first ex:i1 .
_:b1 rdf:rest _:b2 .
_:b2 rdf:first ex:i2 .
_:b2 rdf:rest _:b3 .
_:b3 rdf:first ex:i3 .
_:b3 rdf:rest rdf:nil .

Users will thus have no tool to determine that their shapes graphs are
syntactically invalid and thus that their shapes graphs will be processed
the same in different SHACL implementations.


Therefore if there is not a requirement for complete syntax checking
interoperability will be severely compromised.  Users of SHACL
implementations that do not signal whether the shapes graph conforms to the
syntactic requirements of SHACL Core or SHACL-SPARQL will have no tool for
determining whether their shapes graphs, whether they conform to SHACL
syntactic requirements or not, can be interoperably processed by other SHACL
implementations.

Optional signalling whether the shapes conforms to SHACL syntactic
requirements, as in the resolution of ISSUE-233, is in no way a replacement
for the requirement that all SHACL implementations check for correct SHACL
syntax.  Users need to be able to know whether their shapes graph conforms
to SHACL syntactic requirements or not.


Implementing a complete check for SHACL Core is quite easy.  SHACL Core
implementations will of necessity have to check most of the syntactic
requirements of SHACL Core anyway.  For example, an implementation
will have to check whether a property path refers to itself, or will go into
an infinite loop.

The major place where implementations can get away without complete SHACL
Core syntax checking is SHACL lists, as a simple SPARQL query can retrieve
elements of SHACL lists without doing complete syntax checking for SHACL
lists.  However, it is quite easy to implement complete syntax checking for
SHACL lists, for example by the following Python program (that uses rdflib)

class SHACLSyntaxError(Error) :
    def __init__(self,message) : self.message = message

def countValues(g,subject,predicate) :
    return sum( 1 for _ in g.objects(subject,predicate) )

def wflist(g,node,elements=[]) :
    if ( node in elements ) : raise SHACLSyntaxError("Circular list")
    if ( node == RDF.nil ) :
        if ( (RDF.nil,RDF.first,None) in graph ) :
            raise SHACLSyntaxError("rdf:nil has rdf:first")
        if ( (RDF.nil,RDF.next,None) in graph ) :
            raise SHACLSyntaxError("rdf:nil has rdf:next")
        return elements
    if ( countValue(g,node,RDF.first) != 1 ) :
        raise SHACLSyntaxError("List element has wrong number of values for
rdf:first")
    if ( countValue(g,node,RDF.rest) != 1 ) :
        raise SHACLSyntaxError("List element has wrong number of values for
rdf:rest")
    return wflist(g,g.value(node,RDF.rest),
                  elements.append(g.value(node,RDF.first)))

Implementing a complete syntax check for SHACL-SPARQL is more difficult,
but the structural requirements are not too hard and the SPARQL syntax
checks can be done by a SPARQL implementation, which a SHACL-SPARQL needs to
use anway.


It is not neceesary that SHACL implementations always check for correct
SHACL syntax, just that they can be made to do so.  If a SHACL
implementation finds it too computationally expensive to always check for
correct SHACL syntax it could provide an "unsafe" interface that does not do
complete syntax checking.


Complete syntax checking is neither difficult nor expensive.  Not requiring
complete syntax checking produces severe interoperability problems.
Complete syntax checking thus needs to be a requirement on all SHACL
implementations.


Peter F. Patel-Schneider
Nuance Communications
Received on Thursday, 23 February 2017 04:09:58 UTC