Re: a SHACL specification based on SPARQL from Holger Knublauch on 2015-03-04 (public-data-shapes-wg@w3.org from March 2015)

From: Holger Knublauch <holger@topquadrant.com>
Date: Wed, 04 Mar 2015 11:11:57 +1000
To: public-data-shapes-wg@w3.org
Message-ID: <54F65BDD.8040807@topquadrant.com>
Hi Peter,

thanks for this proposal. Interesting to see alternative ways of 
introducing/explaining/specifying this technology. Of course lots of 
details are missing, so this would need much more work.

As a first step, I am trying to enumerate the differences between your 
proposal and what's currently in the draft spec. Comments in-line.

On 3/4/2015 4:02, Peter F. Patel-Schneider wrote:
> -----BEGIN PGP SIGNED MESSAGE-----
> Hash: SHA1
>
> Here is the core of what I think a SHACL specification based on SPARQL
> should look like.
>
> peter
>
>
>
>
>   SHACL Specification
>
>
> Preliminaries
>
> Throughout the text of this document IRIs are written in CURIE form using
> the following prefixes:
> rdf = ...
> rdfs = ...
> xsd = ...
> shacl = ...
>
>
> The SHACL Core Constraint Language
>
> SHACL is based on the SPARQL 1.1 Query Language.  In SHACL, certain results
> of the evaluation of SPARQL 1.1 Query Language queries (hereafter SPARQL
> queries) on an RDF graph or dataset under the RDFS entailment regime are
> interpreted as constraint violations.

Do you want to make RDFS entailment mandatory?

You seem to want to give SPARQL is more dominant role than in my 
proposal, where other "native" language would be more easily integrated 
in the future. Correct?

>
> The different kinds of results for SPARQL queries require different ways of
> interpreting results in SHACL.  For a SELECT query, each separate mapping is
> interpreted as a separate violation of the constraint.  If there are no
> mappings then the constraint is not violated.  For a CONSTRUCT query, each
> RDFS instance of shacl:violation (node whose denotation is in the class
> extension of shacl:violation in all RDFS models of the constructed graph) is
> a separate violation of the constraint.  If there are no RDFS instances of
> shacl:violation in the constructed graph then the constraint is not
> violated.  For an ASK query, a true result is interpreted as a violation and
> a false result is interpreted as not a violation.  (This interpretation
> makes ASK constraints similar to the other kinds of constraints.)  DESCRIBE
> queries are not used in SHACL.

This is similar, bar syntactic details on RDF triple level.

>
>
> The SHACL Control Language
>
> SHACL constraints can be directly evaluated, just as SPARQL queries are.
> All that is needed is a SHACL constraint and an RDF graph or dataset.  The
> result of the constraint is the result of the query, and is interpreted as
> above.
>
> SHACL constraints can also be encoded and collected in RDF graphs.  Each
> node in such a graph that is an RDFS instance of shacl:Constraint is the
> control node of a SHACL constraint.  A SHACL engine evaluates the
> constraints in an RDF graph by taking each such node and evaluating the
> node's constraint against an RDF graph or dataset.

Does this relate to global constraints in the current SHACL draft?

>
> SHACL control nodes can also have a shacl:severity link to one or more of
> shacl:fatal, shacl:warning, or shacl:informative, indicating the severity of
> any violations of the constraint.

This property is currently called sh:level, but sh:severity would work 
too. I have added a corresponding note.

>
> The SHACL Core Control Language
>
> The simplest kind of SHACL control node is a node linked via a
> shacl:constraint triple to a SPARQL constraint encoded as an RDF string
> literal.  These nodes are called SHACL Core Control Nodes.

Looks equivalent to sh:NativeConstraint + sh:constraint.

>
> The SHACL Extended Control Language
>
> Other SHACL control nodes allow the separation of a constraint into three
> sections: a scope section, a shape section, and a reporting section.  These
> SHACL control nodes, called SHACL Extended Control Nodes, must have
> precisely one of the ways below of specifying the scope and the shape, at
> most one way of specifying reporting, and at most one way of specifying
> severity.
>
> The scope of a SHACL constraint is specified via
> 1/ a shacl:individualScope link to an IRI literal,
> 2/ a shacl:classScope link to an IRI literal,

What are "IRI literals"? xsd:anyURI?

shacl:individualScope appears similar to rdf:type/sh:nodeShape?

I don't understand shacl:classScope.

Overall you seem to invert the direction of the linkage: constraints 
appear stand-alone entities have a forward reference into a class or 
individual. While these relationships can be walked in either direction, 
my current draft does the linkage in the opposite direction from yours, 
because I believe this is far easier to write down when you start at a 
class or shape.

> 3/ a shacl:shapeScope link to a SHACL shape (see below), or
> 4/ a shacl:constraintScope link to a string literal.

I don't understand shacl:constraintScope.

>
> The shape of a SHACL constraint is specified via
> 1/ a shacl:shape link to a SHACL shape (see below), or

How is shapeScope different from shacl:shape?

> 2/ a shacl:constraint link to a string literal.
>
> The reporting for a SHACL constraint is specified via
> 1/ a shacl:report link to a string literal.

Is this sh:message?

>
> These kinds of SHACL control nodes are handled by first constructing three
> parts of the SHACL constraint.
> 1/ The control portion of the constraint, <control>, is
>     a/ VALUES ?this { <IRI> }
>        for a shacl:individualScope link to "<IRI>"^^xsd:anyURI
>     b/ ?this rdf:type <IRI> .
>        for a shacl:classScope link to "<IRI>"^^xsd:anyURI

SPIN/current SHACL would also walk the subClassOf triples here, not just 
the direct rdf:type. This means that RDFS entailment is not required.

Why encode IRIs are strings first?

>     c/ <shape>
>        for a shacl:shapeScope link to a node that encodes <shape> (see below)
>     d/ <query>
>        for a shacl:queryScope link to "<query>"^^xsd:string
> 2/ The shape portion of the constraint, <shape>, is
>     a/ <shape>
>        for a shacl:shape link to a node that encodes <shape> (see below)
>     b/ <query>
>        for a shacl:query link to the string "<query>"^^xsd:string
> 3/ The reporting portion of the constraint, <report>, is
>     a/ SELECT ?this
>        when there is no shacl:report link
>     b/ <report>
>        for a shacl:report link to "<report>"^^xsd:string
>
> The constraint for a SHACL Extended Control Node is then constructed as
>
>    PREFIX rdf: ...
>    PREFIX rdfs: ...
>    PREFIX xsd: ...
>    PREFIX shacl: ...
>    <report> WHERE
>    { <control> MINUS { <shape> } }

I cannot claim that I have understood this. Examples and more complete 
snippets would help.

>
>
> SHACL Simple Shape Language
>
> SHACL provides a vocabulary for generating the shape portion of constraints.
> In conjunction with the SHACL Extended Control Language this vocabulary
> permits the construction of many, but not all, constraints without needing
> to write SPARQL queries.
>
> A SHACL shape node is a node that is an RDFS instance of shacl:Shape.  Each
> SHACL shape encodes some SPARQL syntax, their shape, that can be used in
> SHACL constraints.
>
> Many SHACL shapes utilize an RDF property.  This property is specified by a
> shacl:property link to an IRI literal.

Is this equivalent to sh:predicate?

Also, why do "shapes" point at a property - shouldn't this be the 
constraints that the shape uses? Or maybe your term "Shape" is my term 
"Template"?

>
> A SHACL property shape with a shacl:valueType link to an IRI literal limits
> the values of a property to be RDFS instances of a class.  The shape by a
> SHACL property node with a shacl:property link to "<property>"^^xsd:anyURI
> and a shacl:valueType link to "<valueType>"^^xsd:anyURI is
>    FILTER NOT EXISTS { ?this <property> V .
>                 FILTER NOT EXISTS { V rdf:type <valueType> . } }
> where V is a fresh variable.
>
> ... add other high-level-language constructs here ...

Looks largely equivalent.

>
>
> SHACL Errors
>
> If a node in an RDF graph is both a core control node and an extended
> control node the result of evaluating the graph is undefined.  If a node in
> an RDF graph that is an RDFS instance of shacl:constraint is neither a SHACL
> Core Control Node nor a SHACL Extended Control Node the result of evaluating
> the graph is undefined.  If a SHACL shape node in an RDF graph encodes more
> than shape then the result of evaluating the graph is undefined.  SHACL
> engines should signal an error on such graphs.

This seems to exclude the possibility to check constraints over a SHACL 
graph, i.e. apply SHACL to itself.

>
> If evaluation of a constraint would produce a SPARQL error the constraint is
> invalid.  SHACL engines should signal an error for that constraint.

I need much more details before I'd be able to comment further on this 
proposal.

Much of it overlaps with the current SHACL draft, yet some aspects seem 
to depart quite a bit. I wonder why we should replace design patterns 
that have already been used successfully in SPIN for many years with 
something experimental, especially your proposed mechanism to represent 
and group constraints together. In the absence of strong reasons, I'd 
vote for established patterns. Also your use of xsd:anyURI instead of 
real IRI references looks very unusual. I also anticipate a lot of 
resistance on the strong binding to SPARQL within the WG, and I'd rather 
be willing to compromise on that.

Thanks
Holger
Received on Wednesday, 4 March 2015 01:13:24 UTC