Re: More comments on SHACL Editor's Draft of 29 April from Arnaud Le Hors on 2016-05-09 (public-rdf-shapes@w3.org from May 2016)

From: Arnaud Le Hors <lehors@us.ibm.com>
Date: Mon, 9 May 2016 12:57:33 -0700
To: Thomas Baker <tom@tombaker.org>
Cc: RDF Shapes <public-rdf-shapes@w3.org>
Message-Id: <201605091957.u49Jvc6O022212@d01av05.pok.ibm.com>
Hi Tom,

On behalf of the WG I want to thank you for taking the time to carefully 
review our current draft and sending detail comments. Your feedback is 
very valuable to us.

We already spent a considerable amount of time discussing it on our WG 
call last week and editors are already taking action to try and address 
some of them. I'd like to update the TR draft soon because we haven't done 
that in several months and we probably won't be able to address all of 
your comments in that time frame but rest assured that this doesn't mean 
we are ignoring them.

Thank you.
--
Arnaud  Le Hors - Senior Technical Staff Member, Open Web Technologies - 
IBM Cloud




From:   Thomas Baker <tom@tombaker.org>
To:     RDF Shapes <public-rdf-shapes@w3.org>
Date:   05/05/2016 01:25 AM
Subject:        More comments on SHACL Editor's Draft of 29 April



More comments on SHACL [1], Editor's Draft 29 April 2016
http://w3c.github.io/data-shapes/shacl/

I posted a previous batch of comments on 1 May [1] but have learned a few
things since then.  I remain unsure what the specification really means in 
some
respects, so the following reflects what I think the specification 
"really"
means -- what I infer it to mean -- with some suggestions on how the spec 
could help the reader by articulating some key assumptions up-front.

1. SHACL provides a vocabulary for describing shapes and a simple 
   algorithm for "validating" an arbitrary graph of RDF data (Data Graph)
   against an RDF description of data shapes (Shapes Graph).

2. The SHACL validation algorithm checks the conformance of triples in 
   the Data Graph to "constraints" described in the Shapes Graph.

3. Validation evaluates a target Data Graph at the level of its abstract 
   syntax.  In accordance with RDF 1.1 Concepts and Abstract Syntax [1], 
   RDF abstract syntax consists of triples, or subject and object nodes 
   connected with predicates, with nodes that may be IRIs, blanks, or 
   datatyped literals. The SHACL spec's use of "focus nodes" fits with 
   the use of "node" in rdf11-concepts [2].
 
4. In accordance with the Closed-World Assumption (CWA), the validation 
   algorithm limits itself to matching constraint patterns, as described 
in 
   the Shapes Graph, against the abstract-syntactic components of the 
triples
   actually asserted in target Data Graph, with no further interpretation 
of
   the Data Graph or inferencing based on its formal semantics.

5. A Shapes Graph is expressed in RDF.  Even though the primary use of 
   a Shapes Graph is for CWA-based validation, it should be noted that the
   semantics of the Shapes Graph itself, as of any other expression in 
RDF,
   follows the Open-World Assumption (OWA). 
 
6. The inherently open-world meaning of the Shapes Graph, however, does 
not
   seem to be of practical consequence for its use in CWA-based validation 
--
   unless, perhaps, one were to construct or augment a Shapes Graph with 
inferred
   triples -- with the caveat that shapes graphs could potentially pollute 

   "real" data by adding meaning that is not intended to be interpreted as 

   real data, e.g., as when the practical hack of using a class IRI to 
name a 
   shape were followed (Section 2.1.2.1, "Implicit Class Scopes").

7. A Shapes Graph may specify a potential set of "focus nodes" as the 
"scope"
   of validation in the Data Graph.  A Shapes Graph may also specify a 
potential 
   set of "focus nodes" to be dropped out of the validation scope 
("filtered").
   Potential focus nodes may or may not match actual nodes in the Data 
Graph.
 
8. Validation based on closed-world assumptions applies to the 
relationship
   between constraints (as described the Shapes Graph) and triples in the 
data
   graph viewed at the level of their RDF abstract-syntactic components
   (e.g., the "focus nodes").

Note: An earlier iteration of these comments was posted on the 
DC-ARCHITECTURE
[3].  The resulting thread drew out some additional comments and insights 
that 
could be of interest to members of Data Shapes.
 
[1] 
https://lists.w3.org/Archives/Public/public-rdf-shapes/2016May/0000.html
[2] https://www.w3.org/TR/rdf11-concepts/
[3] 
https://www.jiscmail.ac.uk/cgi-bin/webadmin?A2=ind1605&L=dc-architecture&P=3148


----------------------------------------------------------------------
Discussion

Because SHACL is expressed in RDF, like it or not, a Shapes Graph is
interpreted according to OWA.  Since the design decision was made to 
express
the Shapes Graph in RDF, and not in a completely different syntax -- as in 
the
case of SPARQL or, for that matter, DCMI's DSP -- the native OWA 
interpretation
of a Shapes Graph cannot be papered over, ignored, or otherwise 
contradicted.

The design choice of expressing Shapes Graphs in RDF does somewhat limit 
SHACL,
in certain respects, compared to SPARQL or DSP.  In SPARQL, for example,
`rdfs:subClassOf*` is interpreted as referring to the transitive closure 
of
`rdfs:subClassOf`; the asterisk is a sort of syntactic sugar, a 
convenience
notation, that triggers specific inferences.  As there is no equivalent 
way to
express `rdfs:subClassOf*` in RDFS, there is no way to say that
`rdfs:subClassOf` actually _means_ the transitive closure without, in 
effect, 
arbitrarily overriding its global semantics.

Perhaps this is why the SHACL spec says that "SHACL does not always use 
this
vocabulary or these concepts in exactly the way that they are formally 
defined
in RDF and RDFS" (Section 1.3) -- a notion which gratuitously sets SHACL 
at
odds with W3C Semantic Web standards.

One could perhaps sidestep the issue by dropping _all_ consideration of
inferencing from the normative SHACL specification; saying only that there 
may
be a need for inferencing in a pre-processing phase; then discussing those
pre-processing options in a separate guidance document.  Putting 
inferencing
out of scope would make the SHACL spec simpler, clearer, and shorter.

Abstract syntax issues

Because SHACL is viewing RDF data graphs through a closed-world lens, the
meaning of the graph is beside the point -- just as the meaning of a graph 
is
beside the point with SPARQL.  A SHACL Shapes Graph is validated against a 
Data
Graph at the level of the abstract syntax of the Data Graph.  According to 
RDF
1.1 Concepts and Abstract Syntax, RDF graphs are sets of 
subject-predicate-
object triples, where the elements may be IRIs, blank nodes, or datatyped
literals [1]. 

Note that at the level of their abstract syntax, RDF Graphs have no 
"classes"
and no "instances"!  A search in rdf11-concepts [1] for the words 
"instance" or
"class" will find no mention of either one, anywhere in the spec. 

Confusingly, the SHACL spec makes reference to "instances", "classes", or
"instances of classes" in the Data Graph, viewing the Data Graph through a
semantic lens.  Coining a new SHACL-specific notion of "instance" (and 
"class",
etc) next to the existing notions of RDF "instance" and OO "instance" make
SHACL particularly hard to grok.  At the end of Section 1.3, for example, 
the
definition for "instance" starts off by saying:

  "A node is an instance of a class..."

which I take to mean:

  "A node [in the Data Graph] is an instance of a class..."

By comparison, the SPARQL spec specifies a SPARQL-specific syntax to 
express
triple patterns composed of variables and RDF-abstract-syntactic things 
such as
IRIs and Literals.  SPARQL itself does not "understand" that something is 
a
class or an instance -- it simply supports the formation of triple 
patterns and
leaves it to Primers and other usage guides to express queries, 
informally, in
semantic terms (e.g., "What data is stored about instances of class X?") 
This
separation of concerns makes the SPARQL specification much easier to
understand.  It is worth noting that DCMI's Description Set Profile 
Constraint
Language [3] also defines its own syntax.

As an aside, it is unclear to me why it is even necessary for the SHACL 
spec to
redefine an already-loaded, overdetermined term such as "class" to refer 
to a
set of what one might call "type-matched focus nodes".   If the intention 
is to
make SHACL more understandable to people who are unfamiliar with RDF, this
should be done not in the formal spec but in a primer or tutorial, where 
an
explanation can be customized for a specific audience, such as 
programmers.

A year ago, it was proposed that an abstract syntax be developed for SHACL 
[4].
There was little discussion and the issue remains open but neglected. 
Since
SHACL is natively expressed in RDF, its abstract syntax is in effect the
abstract syntax for RDF.  It is not clear to me whether this is actually a 
good
idea.  If a Shapes Graph only exists to be used in a closed-world process
validating a Data Graph, what is the specific advantage of expressing it 
in
RDF?  Might a proper abstract syntax for SHACL, based on its own BNF, etc,
further focus and clarify the SHACL language?  On the other hand, I see no
specific reasons why SHACL should _not_ use RDF to express shapes graphs 
as it
does -- provided that the spec (or a primer) point out any potential 
pitfalls,
as touched on above.
 
[1] https://www.w3.org/TR/rdf11-concepts/
[2] https://www.w3.org/TR/rdf11-concepts/#data-model
[3] http://dublincore.org/documents/dc-dsp/
[4] https://www.w3.org/2014/data-shapes/track/issues/52


-- 
Tom Baker <tom@tombaker.org>
Received on Monday, 9 May 2016 19:58:18 UTC