- From: Thomas Baker <tom@tombaker.org>
- Date: Thu, 5 May 2016 10:15:11 +0200
- To: RDF Shapes <public-rdf-shapes@w3.org>
More comments on SHACL [1], Editor's Draft 29 April 2016 http://w3c.github.io/data-shapes/shacl/ I posted a previous batch of comments on 1 May [1] but have learned a few things since then. I remain unsure what the specification really means in some respects, so the following reflects what I think the specification "really" means -- what I infer it to mean -- with some suggestions on how the spec could help the reader by articulating some key assumptions up-front. 1. SHACL provides a vocabulary for describing shapes and a simple algorithm for "validating" an arbitrary graph of RDF data (Data Graph) against an RDF description of data shapes (Shapes Graph). 2. The SHACL validation algorithm checks the conformance of triples in the Data Graph to "constraints" described in the Shapes Graph. 3. Validation evaluates a target Data Graph at the level of its abstract syntax. In accordance with RDF 1.1 Concepts and Abstract Syntax [1], RDF abstract syntax consists of triples, or subject and object nodes connected with predicates, with nodes that may be IRIs, blanks, or datatyped literals. The SHACL spec's use of "focus nodes" fits with the use of "node" in rdf11-concepts [2]. 4. In accordance with the Closed-World Assumption (CWA), the validation algorithm limits itself to matching constraint patterns, as described in the Shapes Graph, against the abstract-syntactic components of the triples actually asserted in target Data Graph, with no further interpretation of the Data Graph or inferencing based on its formal semantics. 5. A Shapes Graph is expressed in RDF. Even though the primary use of a Shapes Graph is for CWA-based validation, it should be noted that the semantics of the Shapes Graph itself, as of any other expression in RDF, follows the Open-World Assumption (OWA). 6. The inherently open-world meaning of the Shapes Graph, however, does not seem to be of practical consequence for its use in CWA-based validation -- unless, perhaps, one were to construct or augment a Shapes Graph with inferred triples -- with the caveat that shapes graphs could potentially pollute "real" data by adding meaning that is not intended to be interpreted as real data, e.g., as when the practical hack of using a class IRI to name a shape were followed (Section 2.1.2.1, "Implicit Class Scopes"). 7. A Shapes Graph may specify a potential set of "focus nodes" as the "scope" of validation in the Data Graph. A Shapes Graph may also specify a potential set of "focus nodes" to be dropped out of the validation scope ("filtered"). Potential focus nodes may or may not match actual nodes in the Data Graph. 8. Validation based on closed-world assumptions applies to the relationship between constraints (as described the Shapes Graph) and triples in the data graph viewed at the level of their RDF abstract-syntactic components (e.g., the "focus nodes"). Note: An earlier iteration of these comments was posted on the DC-ARCHITECTURE [3]. The resulting thread drew out some additional comments and insights that could be of interest to members of Data Shapes. [1] https://lists.w3.org/Archives/Public/public-rdf-shapes/2016May/0000.html [2] https://www.w3.org/TR/rdf11-concepts/ [3] https://www.jiscmail.ac.uk/cgi-bin/webadmin?A2=ind1605&L=dc-architecture&P=3148 ---------------------------------------------------------------------- Discussion Because SHACL is expressed in RDF, like it or not, a Shapes Graph is interpreted according to OWA. Since the design decision was made to express the Shapes Graph in RDF, and not in a completely different syntax -- as in the case of SPARQL or, for that matter, DCMI's DSP -- the native OWA interpretation of a Shapes Graph cannot be papered over, ignored, or otherwise contradicted. The design choice of expressing Shapes Graphs in RDF does somewhat limit SHACL, in certain respects, compared to SPARQL or DSP. In SPARQL, for example, `rdfs:subClassOf*` is interpreted as referring to the transitive closure of `rdfs:subClassOf`; the asterisk is a sort of syntactic sugar, a convenience notation, that triggers specific inferences. As there is no equivalent way to express `rdfs:subClassOf*` in RDFS, there is no way to say that `rdfs:subClassOf` actually _means_ the transitive closure without, in effect, arbitrarily overriding its global semantics. Perhaps this is why the SHACL spec says that "SHACL does not always use this vocabulary or these concepts in exactly the way that they are formally defined in RDF and RDFS" (Section 1.3) -- a notion which gratuitously sets SHACL at odds with W3C Semantic Web standards. One could perhaps sidestep the issue by dropping _all_ consideration of inferencing from the normative SHACL specification; saying only that there may be a need for inferencing in a pre-processing phase; then discussing those pre-processing options in a separate guidance document. Putting inferencing out of scope would make the SHACL spec simpler, clearer, and shorter. Abstract syntax issues Because SHACL is viewing RDF data graphs through a closed-world lens, the meaning of the graph is beside the point -- just as the meaning of a graph is beside the point with SPARQL. A SHACL Shapes Graph is validated against a Data Graph at the level of the abstract syntax of the Data Graph. According to RDF 1.1 Concepts and Abstract Syntax, RDF graphs are sets of subject-predicate- object triples, where the elements may be IRIs, blank nodes, or datatyped literals [1]. Note that at the level of their abstract syntax, RDF Graphs have no "classes" and no "instances"! A search in rdf11-concepts [1] for the words "instance" or "class" will find no mention of either one, anywhere in the spec. Confusingly, the SHACL spec makes reference to "instances", "classes", or "instances of classes" in the Data Graph, viewing the Data Graph through a semantic lens. Coining a new SHACL-specific notion of "instance" (and "class", etc) next to the existing notions of RDF "instance" and OO "instance" make SHACL particularly hard to grok. At the end of Section 1.3, for example, the definition for "instance" starts off by saying: "A node is an instance of a class..." which I take to mean: "A node [in the Data Graph] is an instance of a class..." By comparison, the SPARQL spec specifies a SPARQL-specific syntax to express triple patterns composed of variables and RDF-abstract-syntactic things such as IRIs and Literals. SPARQL itself does not "understand" that something is a class or an instance -- it simply supports the formation of triple patterns and leaves it to Primers and other usage guides to express queries, informally, in semantic terms (e.g., "What data is stored about instances of class X?") This separation of concerns makes the SPARQL specification much easier to understand. It is worth noting that DCMI's Description Set Profile Constraint Language [3] also defines its own syntax. As an aside, it is unclear to me why it is even necessary for the SHACL spec to redefine an already-loaded, overdetermined term such as "class" to refer to a set of what one might call "type-matched focus nodes". If the intention is to make SHACL more understandable to people who are unfamiliar with RDF, this should be done not in the formal spec but in a primer or tutorial, where an explanation can be customized for a specific audience, such as programmers. A year ago, it was proposed that an abstract syntax be developed for SHACL [4]. There was little discussion and the issue remains open but neglected. Since SHACL is natively expressed in RDF, its abstract syntax is in effect the abstract syntax for RDF. It is not clear to me whether this is actually a good idea. If a Shapes Graph only exists to be used in a closed-world process validating a Data Graph, what is the specific advantage of expressing it in RDF? Might a proper abstract syntax for SHACL, based on its own BNF, etc, further focus and clarify the SHACL language? On the other hand, I see no specific reasons why SHACL should _not_ use RDF to express shapes graphs as it does -- provided that the spec (or a primer) point out any potential pitfalls, as touched on above. [1] https://www.w3.org/TR/rdf11-concepts/ [2] https://www.w3.org/TR/rdf11-concepts/#data-model [3] http://dublincore.org/documents/dc-dsp/ [4] https://www.w3.org/2014/data-shapes/track/issues/52 -- Tom Baker <tom@tombaker.org>
Received on Thursday, 5 May 2016 08:24:21 UTC