formal objection on excluding many well-behaved shapes from SHACL

This is a formal objection to the exclusion from SHACL of numerous shapes
that have well-behaved intuitive meanings.  A number of these exclusions
first appear in the 11 April 2017 version of SHACL.  The overall severity of
these exclusions was noticed during an attempt to produce an implementation
of SHACL Core.

There is no technical reason to exclude numerous shapes that have been
excluded from SHACL and good reasons to permit them.  The exclusions make
SHACL harder to write for users.  As SHACL implementations are free to
behave however they want on shapes graphs that contain these excluded shapes
the exclusions serve as an impediment to interoperability in SHACL.

In most cases the only change required to SHACL to no longer exclude these
shapes and thereby improve the utility and the interoperability of SHACL is
to just remove syntax rules that exclude them.  There is no need to change
the semantics of SHACL at all to include these shapes.   The burden on
implementators will be very low, and in many cases there will be no burden
at all, as these excluded shapes act like similar non-excluded shapes.


Some of these exclusions are as if a programming language made (x>4 && x>6)
illegal, but left (x>4 && x>=6) legal.  These syntax restrictions make it
hard for SHACL users to write legal SHACL shapes graphs.  If there is a need
to warn users about these shapes, it is better to do so using a lint-like
tool that is not tied to the syntax of SHACL.

Actually, the situation is even worse in SHACL than it would be if a
programming language had these sorts of exclusions.  Completely conforming
SHACL implementations are free to do anything at all on most invalid syntax
so a SHACL implementation can signal an error or return any set of
validation results on shapes that contain the analog of (x>4 && x>6), even
though the intuitive behaviour of these shapes is completely clear.


Some of the excluded shapes are degenerate in that they will produce
validation results for all focus nodes or all focus nodes that have a value
for a particular property.  For example, the excluded shape
  ex:false a sh:PropertyShape ;
    sh:path ex:p ;
    sh:datatype xsd:integer ;
    sh:datatype xsd:int .
will produce a validation result for every focus node that has a value for
ex:p.   However, many shapes that are degenerate in this sense are not
excluded.

Others of the excluded shapes are not degenerate in this way but instead
have constraints that are redundant.  For example, in the excluded shape
  ex:redundant a sh:PropertyShape ;
    sh:path ex:p ;
    sh:minInclusive 5 ;
    sh:minInclusive 9 .
one of the constraints is redundant.  However, many similar shapes that also
have redundant constraints are not excluded.

Some of the excluded shapes don't even have any redundant constraints.  For
example in the excluded shape
  ex:dateTime a sh:PropertyShape ;
    sh:path ex:p ;
    sh:minInclusive "2002-10-10T12:00:00-05:00"^^xsd:dateTime ;
    sh:minInclusive "2002-10-10T12:00:00"^^xsd:dateTime .
neither of the constraints are redundant.  To exclude all dateTime values
before "2002-10-10T12:00:00-05:00"^^xsd:dateTime and also before
"2002-10-10T12:00:00"^^xsd:dateTime requires two uses of sh:minInclusive.

Some others of the excluded shapes are somewhat degenerate for standard RDF
graphs but not for generalized RDF graphs, where literals can be subject of
triples.  For example, in standard RDF the excluded shape
  ex:generalized a sh:NodeShape ;
     sh:lessThan ex:p .
produces a validation report for nodes that have values for ex:p and can be
better stated for standard RDF graphs as a property constraint with path
sh:p and sh:maxCount 0.  In generalized RDF this shape is not degenerate as
it also produces a validation report for some literals.  However, many
shapes that are degenerate in this way for standard RDF graphs but not
degenerate for generalized RDF graphs are not excluded.

Yet others of the excluded shapes are not excluded for any discernible
reason at all.  For example, the excluded shape
  ex:comments a sh:PropertyShape ;
    sh:path [ rdfs:comment "inverse of ex:p" ;
           sh:inversePath ex:p ] ;
    sh:class ex:C .
does not have any problems whatsoever.  Its meaning is clear.  It does not
accept or reject all nodes.  It does not have any redundant pieces. It is
excluded for no discernable reason.

It is easy for users to accidentally write these excluded shapes.  Automatic
generation of SHACL shapes is made harder by these exclusions.  There are
extra costs when implementing SHACL syntax checking because of these
exclusions.

Interoperability is particularly harmed by these exclusions.  Fully
conforming SHACL implementations may implement ex:false as producing a
validation report for all nodes.  They may instead implement ex:false as
never producing a validation report, on the argument that multiple
sh:datatype constraints should just be removed.  They may implement ex:false
as not producing a validation report for literals with datatype xsd:integer,
or xsd:int, because the implementation assumes that only one sh:datatype can
be present.  They may even implement ex:false as sometimes not producing a
validation report for literals with datatype xsd:integer, and sometimes not
producing a validation report for literals with datatype xsd:int.


These exclusions should be removed from SHACL by removing the syntax rules

  datatype-maxCount, nodeKind-maxCount, minCount-scope, minCount-maxCount,
  maxCount-scope, maxCount-maxCount, minExclusive-maxCount,
  minInclusive-maxCount, maxExclusive-maxCount, maxInclusive-maxCount,
  minLength-maxCount, maxLength-maxCount, languageIn-maxCount,
  uniqueLang-scope, lessThan-scope, lessThanOrEquals-scope,
  qualifiedValueShape-scope, and in-maxCount

and changing the syntax rules

  path-metarule, path-non-recursive, path-predicate, path-sequence,
  path-alternative, path-inverse, path-zero-or-more, path-one-or-more, and
  path-zero-or-one

to

path-non-recursive A node p is not a well-formed SHACL property path if
      p is a blank node and any of the following rules require,
      directly or indirectly, determining whether p is a
      well-formed SHACL property path.

path-metarule     A node is a well-formed SHACL property
    path if it satisfies exactly one of the following
    rules and if it is a blank node it does not have a value
    for more than one of rdf:first or rdf:rest,
    sh:alternativePath, sh:inversePath, sh:zeroOrMorePath,
    sh:oneOrMorePath, and sh:zeroOrOnePath.

path-predicate    A predicate path is any IRI.

path-sequence    A sequence path is a blank node that is a SHACL list with
    at least two members and each member of the list is a
    well-formed SHACL property path.

path-alternative  An alternative path is a blank node that has exactly one
    value for sh:alternativePath and that value is a SHACL
    list with at least two members and each member of the list
    is a well-formed SHACL property path.

path-inverse      An inverse path is a blank node that has exactly one value
    for sh:inversePath and that value is a well-formed
    SHACL property path.

path-zero-or-more A zero-or-more path is a blank node that has exactly one
    value for sh:zeroOrMorePath and that value is a
    well-formed SHACL property path.

path-one-or-more  A one-or-more path is a blank node that has exactly one
    value for sh:oneOrMorePath and that value is a
    well-formed SHACL property path.

path-zero-or-one A zero-or-one path is a blank node that has exactly one
   value for sh:zeroOrOnePath and that value is a
   well-formed SHACL property path.

The change to path syntax not only permits paths that should not be excluded
but also excludes paths that should not be included, such as
  [ rdf:first ex:p ; rdf:rest ( ex:q ) ; sh:inversePath ex:p ]

These changes to the syntax of SHACL results in a SHACL that is easier to
write, easier to generate, easier to implement, and more interoperable.

Received on Friday, 5 May 2017 14:13:11 UTC