Scoping of constraints (was: a SHACL specification based on SPARQL) from Holger Knublauch on 2015-03-04 (public-data-shapes-wg@w3.org from March 2015)

From: Holger Knublauch <holger@topquadrant.com>
Date: Wed, 04 Mar 2015 15:27:17 +1000
To: public-data-shapes-wg@w3.org
Message-ID: <54F697B5.6070606@topquadrant.com>
Thanks for your clarifications. As a first pass, and as an attempt to 
avoid an unnecessary fragmentation of the "SPARQL camp", I have recorded 
some of your ideas into the current spec [1]. Some ended up becoming 
ISSUEs, but I did add a new section of the idea of Scoping:

     http://w3c.github.io/data-shapes/shacl/#shapeconstraints-scope

This is similar to what Michel had requested in earlier emails [2] and I 
believe I agree with the general idea. We still do not have a 
corresponding requirement in our catalog though.

The general SPARQL-based scope selection appears to be only relevant for 
global constraints, and there it would only be syntactic sugar. I would 
like to see stronger evidence that this use case is really needed in 
practice. I believe it makes most sense to have such 
scoping/precondition mechanisms in conjunction with local shapes, 
because this way it is easier to answer the question "does resource X 
have any constraint violations". Globally scoped or SPARQL-selected 
constraints make it difficult/expensive to figure out which constraints 
to run, while if we already have a shape as a starting point, we only 
need to look at the conditions for each node that is associated with 
that shape.

Holger

[1] 
https://github.com/w3c/data-shapes/commit/40880c4fb380fa8368dcaa3f34c094afd96d4edd
[2] https://lists.w3.org/Archives/Public/public-rdf-shapes/2015Feb/0011.html


On 3/4/2015 12:20, Peter F. Patel-Schneider wrote:
> -----BEGIN PGP SIGNED MESSAGE-----
> Hash: SHA1
>
>
>
> On 03/03/2015 05:11 PM, Holger Knublauch wrote:
>> Hi Peter,
>>
>> thanks for this proposal. Interesting to see alternative ways of
>> introducing/explaining/specifying this technology. Of course lots of
>> details are missing, so this would need much more work.
>>
>> As a first step, I am trying to enumerate the differences between your
>> proposal and what's currently in the draft spec. Comments in-line.
>>
>> On 3/4/2015 4:02, Peter F. Patel-Schneider wrote:
>>> -----BEGIN PGP SIGNED MESSAGE----- Hash: SHA1
>>>
>>> Here is the core of what I think a SHACL specification based on SPARQL
>>> should look like.
>>>
>>> peter
>>>
>>>
>>>
>>>
>>> SHACL Specification
>>>
>>>
>>> Preliminaries
>>>
>>> Throughout the text of this document IRIs are written in CURIE form
>>> using the following prefixes: rdf = ... rdfs = ... xsd = ... shacl =
>>> ...
>>>
>>>
>>> The SHACL Core Constraint Language
>>>
>>> SHACL is based on the SPARQL 1.1 Query Language.  In SHACL, certain
>>> results of the evaluation of SPARQL 1.1 Query Language queries
>>> (hereafter SPARQL queries) on an RDF graph or dataset under the RDFS
>>> entailment regime are interpreted as constraint violations.
>> Do you want to make RDFS entailment mandatory?
> Yes.
>
>> You seem to want to give SPARQL is more dominant role than in my
>> proposal, where other "native" language would be more easily integrated
>> in the future. Correct?
> Correct.  I'm not in favour of having alternative execution engines.  Let's
> make SPARQL dominant.
>
> This doesn't mean that there could not be another semantics for part of
> SHACL, but any such semantics would have to be shown to be equivalent to the
> definitive SPARQL semantics.
>
>
>>> The different kinds of results for SPARQL queries require different
>>> ways of interpreting results in SHACL.  For a SELECT query, each
>>> separate mapping is interpreted as a separate violation of the
>>> constraint.  If there are no mappings then the constraint is not
>>> violated.  For a CONSTRUCT query, each RDFS instance of shacl:violation
>>> (node whose denotation is in the class extension of shacl:violation in
>>> all RDFS models of the constructed graph) is a separate violation of
>>> the constraint.  If there are no RDFS instances of shacl:violation in
>>> the constructed graph then the constraint is not violated.  For an ASK
>>> query, a true result is interpreted as a violation and a false result
>>> is interpreted as not a violation.  (This interpretation makes ASK
>>> constraints similar to the other kinds of constraints.)  DESCRIBE
>>> queries are not used in SHACL.
>> This is similar, bar syntactic details on RDF triple level.
> Yes.  I have shameless stolen from SPIN and elsewhere.
>
>>> The SHACL Control Language
>>>
>>> SHACL constraints can be directly evaluated, just as SPARQL queries
>>> are. All that is needed is a SHACL constraint and an RDF graph or
>>> dataset.  The result of the constraint is the result of the query, and
>>> is interpreted as above.
>>>
>>> SHACL constraints can also be encoded and collected in RDF graphs.
>>> Each node in such a graph that is an RDFS instance of shacl:Constraint
>>> is the control node of a SHACL constraint.  A SHACL engine evaluates
>>> the constraints in an RDF graph by taking each such node and evaluating
>>> the node's constraint against an RDF graph or dataset.
>> Does this relate to global constraints in the current SHACL draft?
> Not really.  This is a way of encoding SHACL so that tools can read and
> execute it.
>
>>> SHACL control nodes can also have a shacl:severity link to one or more
>>> of shacl:fatal, shacl:warning, or shacl:informative, indicating the
>>> severity of any violations of the constraint.
>> This property is currently called sh:level, but sh:severity would work
>> too. I have added a corresponding note.
> I don't care about the precise vocabulary.
>
>>> The SHACL Core Control Language
>>>
>>> The simplest kind of SHACL control node is a node linked via a
>>> shacl:constraint triple to a SPARQL constraint encoded as an RDF
>>> string literal.  These nodes are called SHACL Core Control Nodes.
>> Looks equivalent to sh:NativeConstraint + sh:constraint.
> Probably.
>
>>> The SHACL Extended Control Language
>>>
>>> Other SHACL control nodes allow the separation of a constraint into
>>> three sections: a scope section, a shape section, and a reporting
>>> section.  These SHACL control nodes, called SHACL Extended Control
>>> Nodes, must have precisely one of the ways below of specifying the
>>> scope and the shape, at most one way of specifying reporting, and at
>>> most one way of specifying severity.
>>>
>>> The scope of a SHACL constraint is specified via 1/ a
>>> shacl:individualScope link to an IRI literal, 2/ a shacl:classScope
>>> link to an IRI literal,
>> What are "IRI literals"? xsd:anyURI?
> Yes, as shown below.  I have tried to be as representationally pure as
> possible here.
>
>> shacl:individualScope appears similar to rdf:type/sh:nodeShape?
>>
>> I don't understand shacl:classScope.
> shacl:individualScope checks a single node.  shacl:classShape checks all
> RDFS instances of a class.
>
>
>> Overall you seem to invert the direction of the linkage: constraints
>> appear stand-alone entities have a forward reference into a class or
>> individual.
> Correct, and this was a particular choice.
>
>> While these relationships can be walked in either direction, my current
>> draft does the linkage in the opposite direction from yours, because I
>> believe this is far easier to write down when you start at a class or
>> shape.
>
>>> 3/ a shacl:shapeScope link to a SHACL shape (see below), or 4/ a
>>> shacl:constraintScope link to a string literal.
>> I don't understand shacl:constraintScope.
> This permits arbitrary SPARQL to be the scope of a constraint.
>
>>> The shape of a SHACL constraint is specified via 1/ a shacl:shape link
>>> to a SHACL shape (see below), or
>> How is shapeScope different from shacl:shape?
> One is the scope, the other is the constraining shape.  This allows, for
> example, to check that all nodes with an ex:bug link to ex:verified satisfy
> a particular shape.  The shapeScope would be something like [ shacl:property
> "ex:bug"^^xsd:anyURI, shacl:value "ex:verified"^^xsd:anyURI ].  (This shows
> how pedantic the proposal is about separating use and mention.)
>
>>> 2/ a shacl:constraint link to a string literal.
>>>
>>> The reporting for a SHACL constraint is specified via 1/ a shacl:report
>>> link to a string literal.
>> Is this sh:message?
> Not really.  This would for example allow for things like SELECT ( ?this
> ?status) in the encoded SPARQL.   Turning this into actual messages is not
> in the proposal as of yet.
>
>>> These kinds of SHACL control nodes are handled by first constructing
>>> three parts of the SHACL constraint. 1/ The control portion of the
>>> constraint, <control>, is a/ VALUES ?this { <IRI> } for a
>>> shacl:individualScope link to "<IRI>"^^xsd:anyURI b/ ?this rdf:type
>>> <IRI> . for a shacl:classScope link to "<IRI>"^^xsd:anyURI
>> SPIN/current SHACL would also walk the subClassOf triples here, not just
>> the direct rdf:type. This means that RDFS entailment is not required.
> Yes, and I am violently against going half-way to RDFS.
>
>> Why encode IRIs are strings first?
> To separate use and mention.  It may be that I am being too pedantic but I
> thought that I should provide the most representational purity as a start.
>
>>> c/ <shape> for a shacl:shapeScope link to a node that encodes <shape>
>>> (see below) d/ <query> for a shacl:queryScope link to
>>> "<query>"^^xsd:string 2/ The shape portion of the constraint, <shape>,
>>> is a/ <shape> for a shacl:shape link to a node that encodes <shape>
>>> (see below) b/ <query> for a shacl:query link to the string
>>> "<query>"^^xsd:string 3/ The reporting portion of the constraint,
>>> <report>, is a/ SELECT ?this when there is no shacl:report link b/
>>> <report> for a shacl:report link to "<report>"^^xsd:string
>>>
>>> The constraint for a SHACL Extended Control Node is then constructed
>>> as
>>>
>>> PREFIX rdf: ... PREFIX rdfs: ... PREFIX xsd: ... PREFIX shacl: ...
>>> <report> WHERE { <control> MINUS { <shape> } }
>> I cannot claim that I have understood this. Examples and more complete
>> snippets would help.
> I'll put together a couple of examples shortly, but the basic idea is that
> all this machinery lets you encode many kinds of SPARQL queries in a
> mix-and-match fashion.
>
>>> SHACL Simple Shape Language
>>>
>>> SHACL provides a vocabulary for generating the shape portion of
>>> constraints. In conjunction with the SHACL Extended Control Language
>>> this vocabulary permits the construction of many, but not all,
>>> constraints without needing to write SPARQL queries.
>>>
>>> A SHACL shape node is a node that is an RDFS instance of shacl:Shape.
>>> Each SHACL shape encodes some SPARQL syntax, their shape, that can be
>>> used in SHACL constraints.
>>>
>>> Many SHACL shapes utilize an RDF property.  This property is specified
>>> by a shacl:property link to an IRI literal.
>> Is this equivalent to sh:predicate?
> I think so.
>
>> Also, why do "shapes" point at a property - shouldn't this be the
>> constraints that the shape uses? Or maybe your term "Shape" is my term
>> "Template"?
> I don't think so.  A shape here is really a bit of SPARQL that has
> conceptually one free variable.  These shapes can be combined (using
> conjunction, for example) to create the shape that is top-level part of a
> constraint and the combined with the other bits to create a SPARL query that
> returns violations of the constraint.
>
>>> A SHACL property shape with a shacl:valueType link to an IRI literal
>>> limits the values of a property to be RDFS instances of a class.  The
>>> shape by a SHACL property node with a shacl:property link to
>>> "<property>"^^xsd:anyURI and a shacl:valueType link to
>>> "<valueType>"^^xsd:anyURI is FILTER NOT EXISTS { ?this <property> V .
>>> FILTER NOT EXISTS { V rdf:type <valueType> . } } where V is a fresh
>>> variable.
>>>
>>> ... add other high-level-language constructs here ...
>> Looks largely equivalent.
> This would have the same stuff you would have, maybe with minor syntactic
> differences.
>
>>> SHACL Errors
>>>
>>> If a node in an RDF graph is both a core control node and an extended
>>> control node the result of evaluating the graph is undefined.  If a
>>> node in an RDF graph that is an RDFS instance of shacl:constraint is
>>> neither a SHACL Core Control Node nor a SHACL Extended Control Node the
>>> result of evaluating the graph is undefined.  If a SHACL shape node in
>>> an RDF graph encodes more than shape then the result of evaluating the
>>> graph is undefined.  SHACL engines should signal an error on such
>>> graphs.
>> This seems to exclude the possibility to check constraints over a SHACL
>> graph, i.e. apply SHACL to itself.
> No.  There is no prohibition of having a RDF graph that contains SHACL
> constraints as the data that is being checked by another RDF graph (or even
> the same RDF graph) that encodes SHACL constraints.
>
>>> If evaluation of a constraint would produce a SPARQL error the
>>> constraint is invalid.  SHACL engines should signal an error for that
>>> constraint.
>> I need much more details before I'd be able to comment further on this
>> proposal.
>>
>> Much of it overlaps with the current SHACL draft, yet some aspects seem
>> to depart quite a bit. I wonder why we should replace design patterns
>> that have already been used successfully in SPIN for many years with
>> something experimental, especially your proposed mechanism to represent
>> and group constraints together. In the absence of strong reasons, I'd
>> vote for established patterns. Also your use of xsd:anyURI instead of
>> real IRI references looks very unusual. I also anticipate a lot of
>> resistance on the strong binding to SPARQL within the WG, and I'd rather
>> be willing to compromise on that.
>>
>> Thanks Holger
> My intent is to get rid of the parts of SPIN that impinge on RDFS to produce
> a pure constraint/shape specification.
>
> This specification is essentially complete.  It provides the only control
> structures permissable.  All that is missing is the set of simple shape
> encoders, and maybe a template mechanism to define new shape encoders.
>
>
> peter
>
> -----BEGIN PGP SIGNATURE-----
> Version: GnuPG v1
>
> iQEcBAEBAgAGBQJU9mvUAAoJECjN6+QThfjzdYIH+wT716Ji+M+9VZM+08+y1flN
> 6wbKZGl6cVzJMj16cHx/PVBj5dxMoJoVr3xLPT8YMUgTgJhRjWe6FvfBngFHf9dw
> o9LynnOOAG1aTqUt3xeD/rMDWDJ75drrxxqyzyP7WdtumNm4/WSbasIbaVJbl1s2
> Tv/H+3LBIqFytW5G2HL4TAUw7EKYNKqSMfCYwk1GvRxobq60xPOlvjo/63v3eRPm
> J4HUKo3TPKAoQHTToVcvoRNeQmxcaNcFZpN+ssEuKarwpF7+kTAkyF9Z3nsGWXKw
> j0tJWrsZvjfn6O7/NiZW+/oJoz7E8fif+XuD9nCJZygec4huPgEmTbJC9iF8oFA=
> =WIg2
> -----END PGP SIGNATURE-----
Received on Wednesday, 4 March 2015 05:29:16 UTC