Re: a SHACL specification based on SPARQL from Peter F. Patel-Schneider on 2015-03-04 (public-data-shapes-wg@w3.org from March 2015)

From: Peter F. Patel-Schneider <pfpschneider@gmail.com>
Date: Tue, 03 Mar 2015 18:20:04 -0800
To: Holger Knublauch <holger@topquadrant.com>, public-data-shapes-wg@w3.org
Message-ID: <54F66BD4.1060007@gmail.com>
-----BEGIN PGP SIGNED MESSAGE-----
Hash: SHA1



On 03/03/2015 05:11 PM, Holger Knublauch wrote:
> Hi Peter,
> 
> thanks for this proposal. Interesting to see alternative ways of 
> introducing/explaining/specifying this technology. Of course lots of
> details are missing, so this would need much more work.
> 
> As a first step, I am trying to enumerate the differences between your 
> proposal and what's currently in the draft spec. Comments in-line.
> 
> On 3/4/2015 4:02, Peter F. Patel-Schneider wrote:
>> -----BEGIN PGP SIGNED MESSAGE----- Hash: SHA1
>> 
>> Here is the core of what I think a SHACL specification based on SPARQL 
>> should look like.
>> 
>> peter
>> 
>> 
>> 
>> 
>> SHACL Specification
>> 
>> 
>> Preliminaries
>> 
>> Throughout the text of this document IRIs are written in CURIE form
>> using the following prefixes: rdf = ... rdfs = ... xsd = ... shacl =
>> ...
>> 
>> 
>> The SHACL Core Constraint Language
>> 
>> SHACL is based on the SPARQL 1.1 Query Language.  In SHACL, certain
>> results of the evaluation of SPARQL 1.1 Query Language queries
>> (hereafter SPARQL queries) on an RDF graph or dataset under the RDFS
>> entailment regime are interpreted as constraint violations.
> 
> Do you want to make RDFS entailment mandatory?

Yes.

> You seem to want to give SPARQL is more dominant role than in my
> proposal, where other "native" language would be more easily integrated
> in the future. Correct?

Correct.  I'm not in favour of having alternative execution engines.  Let's
make SPARQL dominant.

This doesn't mean that there could not be another semantics for part of
SHACL, but any such semantics would have to be shown to be equivalent to the
definitive SPARQL semantics.


>> 
>> The different kinds of results for SPARQL queries require different
>> ways of interpreting results in SHACL.  For a SELECT query, each
>> separate mapping is interpreted as a separate violation of the
>> constraint.  If there are no mappings then the constraint is not
>> violated.  For a CONSTRUCT query, each RDFS instance of shacl:violation
>> (node whose denotation is in the class extension of shacl:violation in
>> all RDFS models of the constructed graph) is a separate violation of
>> the constraint.  If there are no RDFS instances of shacl:violation in
>> the constructed graph then the constraint is not violated.  For an ASK
>> query, a true result is interpreted as a violation and a false result
>> is interpreted as not a violation.  (This interpretation makes ASK
>> constraints similar to the other kinds of constraints.)  DESCRIBE 
>> queries are not used in SHACL.
> 
> This is similar, bar syntactic details on RDF triple level.

Yes.  I have shameless stolen from SPIN and elsewhere.

>> The SHACL Control Language
>> 
>> SHACL constraints can be directly evaluated, just as SPARQL queries
>> are. All that is needed is a SHACL constraint and an RDF graph or
>> dataset.  The result of the constraint is the result of the query, and
>> is interpreted as above.
>> 
>> SHACL constraints can also be encoded and collected in RDF graphs.
>> Each node in such a graph that is an RDFS instance of shacl:Constraint
>> is the control node of a SHACL constraint.  A SHACL engine evaluates
>> the constraints in an RDF graph by taking each such node and evaluating
>> the node's constraint against an RDF graph or dataset.
> 
> Does this relate to global constraints in the current SHACL draft?

Not really.  This is a way of encoding SHACL so that tools can read and
execute it.

>> SHACL control nodes can also have a shacl:severity link to one or more
>> of shacl:fatal, shacl:warning, or shacl:informative, indicating the
>> severity of any violations of the constraint.
> 
> This property is currently called sh:level, but sh:severity would work
> too. I have added a corresponding note.

I don't care about the precise vocabulary.

>> The SHACL Core Control Language
>> 
>> The simplest kind of SHACL control node is a node linked via a 
>> shacl:constraint triple to a SPARQL constraint encoded as an RDF
>> string literal.  These nodes are called SHACL Core Control Nodes.
> 
> Looks equivalent to sh:NativeConstraint + sh:constraint.

Probably.

>> The SHACL Extended Control Language
>> 
>> Other SHACL control nodes allow the separation of a constraint into
>> three sections: a scope section, a shape section, and a reporting
>> section.  These SHACL control nodes, called SHACL Extended Control
>> Nodes, must have precisely one of the ways below of specifying the
>> scope and the shape, at most one way of specifying reporting, and at
>> most one way of specifying severity.
>> 
>> The scope of a SHACL constraint is specified via 1/ a
>> shacl:individualScope link to an IRI literal, 2/ a shacl:classScope
>> link to an IRI literal,
> 
> What are "IRI literals"? xsd:anyURI?

Yes, as shown below.  I have tried to be as representationally pure as
possible here.

> shacl:individualScope appears similar to rdf:type/sh:nodeShape?
> 
> I don't understand shacl:classScope.

shacl:individualScope checks a single node.  shacl:classShape checks all
RDFS instances of a class.


> Overall you seem to invert the direction of the linkage: constraints
> appear stand-alone entities have a forward reference into a class or
> individual.

Correct, and this was a particular choice.

> While these relationships can be walked in either direction, my current
> draft does the linkage in the opposite direction from yours, because I
> believe this is far easier to write down when you start at a class or
> shape.


>> 3/ a shacl:shapeScope link to a SHACL shape (see below), or 4/ a
>> shacl:constraintScope link to a string literal.
> 
> I don't understand shacl:constraintScope.

This permits arbitrary SPARQL to be the scope of a constraint.

>> The shape of a SHACL constraint is specified via 1/ a shacl:shape link
>> to a SHACL shape (see below), or
> 
> How is shapeScope different from shacl:shape?

One is the scope, the other is the constraining shape.  This allows, for
example, to check that all nodes with an ex:bug link to ex:verified satisfy
a particular shape.  The shapeScope would be something like [ shacl:property
"ex:bug"^^xsd:anyURI, shacl:value "ex:verified"^^xsd:anyURI ].  (This shows
how pedantic the proposal is about separating use and mention.)

>> 2/ a shacl:constraint link to a string literal.
>> 
>> The reporting for a SHACL constraint is specified via 1/ a shacl:report
>> link to a string literal.
> 
> Is this sh:message?

Not really.  This would for example allow for things like SELECT ( ?this
?status) in the encoded SPARQL.   Turning this into actual messages is not
in the proposal as of yet.

>> These kinds of SHACL control nodes are handled by first constructing
>> three parts of the SHACL constraint. 1/ The control portion of the
>> constraint, <control>, is a/ VALUES ?this { <IRI> } for a
>> shacl:individualScope link to "<IRI>"^^xsd:anyURI b/ ?this rdf:type
>> <IRI> . for a shacl:classScope link to "<IRI>"^^xsd:anyURI
> 
> SPIN/current SHACL would also walk the subClassOf triples here, not just
> the direct rdf:type. This means that RDFS entailment is not required.

Yes, and I am violently against going half-way to RDFS.

> Why encode IRIs are strings first?

To separate use and mention.  It may be that I am being too pedantic but I
thought that I should provide the most representational purity as a start.

>> c/ <shape> for a shacl:shapeScope link to a node that encodes <shape>
>> (see below) d/ <query> for a shacl:queryScope link to
>> "<query>"^^xsd:string 2/ The shape portion of the constraint, <shape>,
>> is a/ <shape> for a shacl:shape link to a node that encodes <shape>
>> (see below) b/ <query> for a shacl:query link to the string
>> "<query>"^^xsd:string 3/ The reporting portion of the constraint,
>> <report>, is a/ SELECT ?this when there is no shacl:report link b/
>> <report> for a shacl:report link to "<report>"^^xsd:string
>> 
>> The constraint for a SHACL Extended Control Node is then constructed
>> as
>> 
>> PREFIX rdf: ... PREFIX rdfs: ... PREFIX xsd: ... PREFIX shacl: ... 
>> <report> WHERE { <control> MINUS { <shape> } }
> 
> I cannot claim that I have understood this. Examples and more complete 
> snippets would help.

I'll put together a couple of examples shortly, but the basic idea is that
all this machinery lets you encode many kinds of SPARQL queries in a
mix-and-match fashion.

>> 
>> SHACL Simple Shape Language
>> 
>> SHACL provides a vocabulary for generating the shape portion of
>> constraints. In conjunction with the SHACL Extended Control Language
>> this vocabulary permits the construction of many, but not all,
>> constraints without needing to write SPARQL queries.
>> 
>> A SHACL shape node is a node that is an RDFS instance of shacl:Shape.
>> Each SHACL shape encodes some SPARQL syntax, their shape, that can be
>> used in SHACL constraints.
>> 
>> Many SHACL shapes utilize an RDF property.  This property is specified
>> by a shacl:property link to an IRI literal.
> 
> Is this equivalent to sh:predicate?

I think so.

> Also, why do "shapes" point at a property - shouldn't this be the
> constraints that the shape uses? Or maybe your term "Shape" is my term
> "Template"?

I don't think so.  A shape here is really a bit of SPARQL that has
conceptually one free variable.  These shapes can be combined (using
conjunction, for example) to create the shape that is top-level part of a
constraint and the combined with the other bits to create a SPARL query that
returns violations of the constraint.

>> A SHACL property shape with a shacl:valueType link to an IRI literal
>> limits the values of a property to be RDFS instances of a class.  The
>> shape by a SHACL property node with a shacl:property link to
>> "<property>"^^xsd:anyURI and a shacl:valueType link to
>> "<valueType>"^^xsd:anyURI is FILTER NOT EXISTS { ?this <property> V . 
>> FILTER NOT EXISTS { V rdf:type <valueType> . } } where V is a fresh
>> variable.
>> 
>> ... add other high-level-language constructs here ...
> 
> Looks largely equivalent.

This would have the same stuff you would have, maybe with minor syntactic
differences.

>> SHACL Errors
>> 
>> If a node in an RDF graph is both a core control node and an extended 
>> control node the result of evaluating the graph is undefined.  If a
>> node in an RDF graph that is an RDFS instance of shacl:constraint is
>> neither a SHACL Core Control Node nor a SHACL Extended Control Node the
>> result of evaluating the graph is undefined.  If a SHACL shape node in
>> an RDF graph encodes more than shape then the result of evaluating the
>> graph is undefined.  SHACL engines should signal an error on such
>> graphs.
> 
> This seems to exclude the possibility to check constraints over a SHACL
> graph, i.e. apply SHACL to itself.

No.  There is no prohibition of having a RDF graph that contains SHACL
constraints as the data that is being checked by another RDF graph (or even
the same RDF graph) that encodes SHACL constraints.

>> If evaluation of a constraint would produce a SPARQL error the
>> constraint is invalid.  SHACL engines should signal an error for that
>> constraint.
> 
> I need much more details before I'd be able to comment further on this
> proposal.
> 
> Much of it overlaps with the current SHACL draft, yet some aspects seem
> to depart quite a bit. I wonder why we should replace design patterns
> that have already been used successfully in SPIN for many years with
> something experimental, especially your proposed mechanism to represent
> and group constraints together. In the absence of strong reasons, I'd
> vote for established patterns. Also your use of xsd:anyURI instead of
> real IRI references looks very unusual. I also anticipate a lot of
> resistance on the strong binding to SPARQL within the WG, and I'd rather
> be willing to compromise on that.
> 
> Thanks Holger

My intent is to get rid of the parts of SPIN that impinge on RDFS to produce
a pure constraint/shape specification.

This specification is essentially complete.  It provides the only control
structures permissable.  All that is missing is the set of simple shape
encoders, and maybe a template mechanism to define new shape encoders.


peter

-----BEGIN PGP SIGNATURE-----
Version: GnuPG v1

iQEcBAEBAgAGBQJU9mvUAAoJECjN6+QThfjzdYIH+wT716Ji+M+9VZM+08+y1flN
6wbKZGl6cVzJMj16cHx/PVBj5dxMoJoVr3xLPT8YMUgTgJhRjWe6FvfBngFHf9dw
o9LynnOOAG1aTqUt3xeD/rMDWDJ75drrxxqyzyP7WdtumNm4/WSbasIbaVJbl1s2
Tv/H+3LBIqFytW5G2HL4TAUw7EKYNKqSMfCYwk1GvRxobq60xPOlvjo/63v3eRPm
J4HUKo3TPKAoQHTToVcvoRNeQmxcaNcFZpN+ssEuKarwpF7+kTAkyF9Z3nsGWXKw
j0tJWrsZvjfn6O7/NiZW+/oJoz7E8fif+XuD9nCJZygec4huPgEmTbJC9iF8oFA=
=WIg2
-----END PGP SIGNATURE-----
Received on Wednesday, 4 March 2015 02:20:38 UTC