Re: comments on SHACL 3 March editors draft from Holger Knublauch on 2016-03-07 (public-data-shapes-wg@w3.org from March 2016)

From: Holger Knublauch <holger@topquadrant.com>
Date: Mon, 7 Mar 2016 14:46:42 +1000
To: public-data-shapes-wg@w3.org
Message-ID: <56DD07B2.7030100@topquadrant.com>
Thanks for the feedback, Peter. I have tried to address it here:

https://github.com/w3c/data-shapes/commit/def77377ebf992a7dcde23e5eff1c74187385bd0

More inline...

On 7/03/2016 6:59, Peter F. Patel-Schneider wrote:
> General
>
> There are quite MUSTs in the document that are inappropriately used.  For
> example, instead of "a SHACL processor MUST use the union of the focus nodes
> produced by these scopes" say "the scope of a shape is the union of the sets
> of nodes produced by these scopes".  The place to use MUST is in wording
> like "a SHACL processor MUST validate a shape against a data graph as
> described herein".
>
> The document specifies too much about how a SHACL engine is supposed to
> perform validation.  Instead it SHOULD only specify what validation is and let
> developers decide how to implement validation.
>
> There are bad uses of "supposed to" and "expected to be".  What happens if
> these suppositions or expectations are not met?  Instead use precise
> wording.
>
> SHACL should treat both the shapes graph and the data graph as unchanging.
> Wording should be inserted to the effect that if the shapes graph or the
> data graph changes during validation that the results are undefined.
> Wording that implies temporal effects should be removed.
>
> There needs to be some definition of just what a SHACL validation engine
> does, something like "A SHACL validation engine takes two RDF graphs, a
> shapes graph and a data graph, and validates the data graph against the
> shapes graph as described herein.  SHACL tools may provide other interfaces,
> e.g., one that takes an RDF graph in an RDF dataset, adds other triples to
> that graph to produce the data graph, e.g., by using owl:imports triples,
> and/or use triples in that graph to produce the shapes graph, e.g., using
> sh:shapesGraph triples as described herein.  A SHACL validation engine MUST
> implement all constructs in the core of SHACL (Sections 2, 3, 5).  A SHACL
> engine MAY not implement the other parts of SHACL but if it does implement a
> portion of ... it must implement all of ...."
>
>
> Abstract
>
> The abstract says that in SHACL "[a]dditional constraints can be associated
> with shapes using SPARQL and similar extension languages."  This is not the
> case as there is no specification for any execution language other than
> SPARQL.
>
> 1. Introduction
>
> The shapes graph is not a set of shapes.  Instead it contains a set of
> shapes.
>
> "data nodes" -> "nodes in the data graph"
>
> The output of the validation process may be just a YES/NO.

This is not my understanding. I thought SHACL compliant engines must 
return the validation results as defined for each constraint type. 
Otherwise, they would only implement a part of the SHACL core, and 
compromise interchangeability of databases. My understanding is that 
fully compliant engines MAY return more results, but MUST at least 
return the results as specified.

>
> The validation report may be truncated, and not include all validations.
>
> The scope of the shape ex:IssueShape in EXAMPLE 1 is not "all nodes that
> have an rdf:type link to ex:Issue".  The scope of the shape ex:IssueShape in
> EXAMPLE 1 does not state "that all instances of the class ex:Issue shall be
> validated against the shape ex:IssueShape".  The actual scope is instead
> somewhere between these two sets of nodes.
>
> It is not sufficient to say in 1.1 that SHACL has unique versions of types
> and instances.  These notions are in very widespread use.  Each time that
> SHACL deviates from the common, accepted W3C practice it should be called
> out, e.g., "SHACL type" or "SHACL instance".

I hope this doesn't need to be repeated each time as this may render the 
document harder to read. Furthermore, the terms "SHACL type" and "SHACL 
instance" would first need to be formally defined too.

Instead, I suggest we should define what "type", "instance" and 
"subclass" mean in the remainder of the document. I have put a 
corresponding terminology block at the end of section 1.1

>
> The paragraph on special cases is hard to understand.  At a minimum, the
> effects of all these special cases need to be called out whenever they
> occur.  It would be much better to remove the special cases.

Could you clarify what paragraph you mean? Thanks.

>
> The document doesn't actually use SPARQL 1.1 as the normative definition of
> SHACL constraints and scopes.   What it does is intersperse SPARQL with
> hasShape.  A cleaner statement is needed here to clarify the role of SPARQL
> 1.1 in the semantics of the core of SHACL.

I think this is already clear enough - sh:hasShape is mentioned 
explicitly. It didn't state that "only" SPARQL 1.1 is needed anywhere.

>
> The wording about sh:hasShape seems to imply SPARQL engines that only cover
> the core need not provide a mechanism that mirrors sh:hasShape.  This might
> be true at the moment, as recursion is not supported, but may not be true in
> the future.

I don't see why the current wording implies what you state.

>
> Graphs do not have URIs in general.  Instead they may be available at a URL.
> They may also be in a dataset, in which case they may have an IRI (or
> whatever) in that dataset.
>
> 2. Shapes
>
> 2.1 Scopes
>
> Class-based scopes use rdfs:subClassOf as well.
>
> "that are supposed to be validated against" -> "that are in the scope of"
>
> "The scope of a shape is a set of nodes.  It is not necessary that a node be
> present in the data graph to belong to a scope."
>
> "Triples with predicate sh:scopeNode in the shapes graph from a shape to a
> node create a scope for the shape containing only the node.  SPARQL
> implementations MAY produce a warning if this node is not present in the
> data graph."
>
> Don't laud RDF Schema.  SHACL doesn't use RDF Schema.
>
> "Triples with predicate sh:scopeClass in the shapes graph from a shape to a
> node create a scope for the shape containing the nodes in the data graph
> that are SHACL instances of the node in the data graph (linked to the node
> via rdf:type or is linked to the node via rdf:type followed by a chain of
> rdfs:subClassOf links in the data graph)."
>
>
> At this point I stopped reading so carefully, but similar problems continue.
>
>
> 2.2 Filters
>
> A SHACL processor might not begin by validating filters.  It might instead
> look at all in-scope nodes and only later remove those that don't pass the
> filters.  The document makes this illegal but it might be useful if the
> filter shape is expensive to compute and few violations are expected.

I believe this is already covered by stating that SHACL engines can 
produce additional validation results, see discussion above.

>
> The UML-like diagram is misleading.  It uses rdfs:Resource in a way
> different from SHACL.

Do others agree that I should take that diagram out? Could you clarify 
what is wrong about its use of rdfs:Resource?

>
> 3. Core Constraint Types
>
> "points at" is used with two different meanings in the document.  Variables
> don't point at anything.  They have a value.
>
>
> At this point I started skipping large chunks.
>
>
> 4. Declaring the Shapes Graph
>
> RDF graphs don't have to be part of a dataset.  What happens then?

We simply assume a dataset, the rest is an implementation detail. A 
dataset may include virtual graphs that are created on the fly, or 
graphs that are dynamically loaded from their URL, or whatever. But in 
the data interface, each graph has a name (IRI).

>
> This should be moved to a section that discusses how to get to the stage
> where there is a shapes graph and a data graph that can be validated against
> it.
>
>
> 12. Entailment
>
> RDF does not define the notion of the IRI of a graph.  Therefore if SHACL is
> going to use this notion it must define it on its own.

Isn't this already clear? When we talk about a shapes graph, then this 
refers to the role of a graph as validation input. We even have a 
variable $shapesGraph for it. From this, a validation engine can find 
the IRI node in the graph (possibly a owl:Ontology instance).

Holger
Received on Monday, 7 March 2016 04:47:18 UTC