- From: Irene Polikoff <irene@topquadrant.com>
- Date: Sun, 01 May 2016 11:59:30 -0400
- To: Thomas Baker <tom@tombaker.org>, RDF Shapes <public-rdf-shapes@w3.org>
[ "Class-based scopes define the scope as the set of all instances of a class." Okay, yes... classes have extensions... after all, RDF Schema 1.1 says that "Associated with each class is a set, called the class extension of the class, which is the set of the instances of the class" [3]. But what does this have to do with defining the set of focus nodes for a shape? The scope of a shape is _not_ a specific data graph but the set of all instances of a class in the world] This could be fixed by saying something like: "Class-based scopes define the scope as the set of all instances of a Class present in the data graph." However, fundamentally, SHACL operates under the closed world assumption. So, it could never concern itself with whatever exists or doesn¹t exist in the world in general - it is always only about the information that has been been made available/submitted to a SHACL engine. Negation as failure and so on. Perhaps, it is worth making a point of explaining this somewhere early on and then, there is no need to constantly clarify points like this everywhere in the document. If a reader approaches the spec with the OWA set of mind, they are bound to make wrong conclusions or become confused. Thus, it is important that a reader puts themselves into the CWA set of mind. Irene Polikoff On 5/1/16, 10:40 AM, "Thomas Baker" <tom@tombaker.org> wrote: >Comments on > >Shapes Constraint Language (SHACL) >Editors Draft 29 April 2016 >http://w3c.github.io/data-shapes/shacl/ > >Some context: I have followed this activity since participating in the >workshop >on RDF validation in 2013 [1]. The activity seemed like it might achieve >the >goals pursued a decade ago with the DCMI Working Draft, Description Set >Profile >Constraint Language [2]. I have tried to keep up with the excellent work >by >Karen Coyle, Antoine Isaac, Hugo Manguinhas, Thomas Hartmann, and others >on >comparing the emerging SHACL specification to requirements that have >accumulated over the years in the Dublin Core community. > >There is alot to like in SHACL but I must confess that each time I tried >to >actually read the specification I found myself getting stuck at the same >places. I'd set it aside, assuming that the issues would shake out. Many >months later, however, I find the same sticking points, unchanged. This >time I >pressed on through the introduction to Section 2.1. > >These comments convey my thoughts while reading the text and end with some >suggestions. I have made no effort to catch up on discussion in the >relevant >mailing lists [4,5], so please forgive me if I simply cover issues here >that >are already well-understood. > >Abstract > > First sentence (also first sentence of Introduction): > > "SHACL is a language for describing and constraining the contents of >RDF > graphs" > > So I ask myself: If an RDF graph is an immutable set of triples, in what > sense can it be "constrained"? If an RDF graph is a description with a > meaning determined by RDF semantics, what does it mean for that >_description_ > to be "described"? Surely SHACL is not meant to somehow limit the > RDF-semantic meaning of an RDF graph, which would make no sense, but >then > what does mean "constraining" mean? Surely the specification of a > "constraint language" should start by defining "constraint". > > Further on, one finds that the "constraint language" actually has >nothing to > do with somehow constraining RDF graphs and everything to do with >describing > an instance of the class "shape", which can be used with a process for > determining whether a given RDF graph conforms to the set of constraints > described in that shape ("validation"). In the Abstract, however, >validation > is mentioned only in passing ("can be used to communicate information >about > data structures... generate or validate data, or drive user >interfaces"). > > The Abstract concludes with an unsettling reference to the "underlying > semantics" of SHACL. We already have RDF semantics. Will this document > define another? > >1. Introduction > > "This document defines what it means for an RDF graph... to conform >to a > graph containing SHACL shapes" > > An improvement over the Abstract. > >1.2. SHACL example > > "A shapes graph containing shape definitions and other information >that can > be utilized to determine what validation is to be done" > > The wording is odd. How about: > > "A shapes graph, which describes a set of constraints, can be used to > determine whether a given data graph conforms to the constraints." > > Up to this point, has the text actually said that SHACL shape graphs are > expressed in RDF? The Document Outline does say that examples are >expressed > in Turtle syntax, which strongly implies RDF. But that SHACL shape >graphs > are expressed in RDF is actually not obvious for anyone who knows that >SPARQL > also expresses shape-like constructs for matching against RDF data, and >that > SPARQL constructs are not themselves expressed in RDF. > > (As an aside, readers of RDF 1.1 Turtle will find instances with >prefixed > names in lowercase, whereas in the SHACL spec the prefixed names are in > uppercase. A sentence about the naming conventions used in this >document > could make this explicit.) > > Section 1.2 continues: > > "ex:IssueShape... [has constraints that apply]... to a (transitive) > subclass of ex:Issue following rdf:subClassOf triples" > > Hmm - nothing in the spec has yet hinted that the process of validating >a > data graph against a shape graph will _require_ additional, out-of-band > information such as schema definitions. > >1.3. Relationship between SHACL and RDF > > "SHACL uses RDF and RDFS vocabulary... and concepts... [but] SHACL >does not > always use this vocabulary or these concepts in exactly the way that >they > are formally defined in RDF and RDFS." > > Hang on, so SHACL does _not_ use RDF/S vocabulary as defined by the >RDF/S > specs?? It is jarring to read this in a W3C rec-track specification. >How is > this not a show-stopper? > > One then learns that SHACL validation is about more than matching an > immutable data graph against an immutable shapes graph. Apparently it > involves the prior creation of an _expanded_ data graph through >selective > materialization of inferred triples. > > The notion of "SHACL processors" having (selectively) to support >inferencing > goes far beyond just defining a vocabulary for describing a shape and a > process for evaluating that shape against a data graph. It implies a > software application with SHACL-specific features and an inferencing >style > that is SHACL-specific -- both of which, to my way of thinking, should >be > completely orthogonal to the language specification, which could quite > reasonably focus on just the vocabulary and validation algorithm. > > If, as the spec points out, "SHACL implementations may operate on RDF >graphs > that include entailments", couldn't the SHACL spec be helpfully >simplified by > leaving the materialization of inferred triples out of scope entirely >-- as > something done in a pre-processing phase, perhaps according to a few > well-known patterns as described in a separate specification? > > The section ends with very puzzling definitions for "subclass", "type", >and > "instance" -- "A node is an instance of a class if one of its types is >the > given class"?? -- but I press on, hoping the next section will bring >some > clarity... > >2. Shapes > > The first paragraph says: > > "Shape scopes define the selection criteria" > > but then Figure 1 says: > > "Scope selects focus nodes" > > If a shape is just a graph (or part of a shapes graph), then surely that > graph cannot actually perform a action, like "selects", as if executed >like a > Java method. Figure 1 also talks about filter shapes that "refine" or > "eliminate" and constraints that "produce". Talking about graphs as >agents > is deeply confusing. > > "Class-based scopes define the scope as the set of all instances of a > class." > > Okay, yes... classes have extensions... after all, RDF Schema 1.1 says >that > "Associated with each class is a set, called the class extension of the > class, which is the set of the instances of the class" [3]. But what >does > this have to do with defining the set of focus nodes for a shape? The >scope > of a shape is _not_ a specific data graph but the set of all instances >of a > class in the world? > > I stop reading. > >Summary and suggestions > >The spec looks quite nice on the surface but the explanation is >conceptually >muddled. Would it not be simpler and clearer to define a SHACL where, to >paraphrase the 2008 DSP specification [2], "the fundamental usage model >for a >[shape] is to examine whether a [data graph] matches the [shape]"? >Everything >else could be out of scope. Some suggestions: > >1. Define "constraint" up-front. > >2. If a shape is described in RDF, say so early on, then avoid implying >that a > SHACL shape is based on any semantics other than RDF semantics. > >3. Come up with better names than 'subclass', 'superclass', 'type', and > 'instance' for whatever it is that is being described. Anyone >familiar with > classes and instances in RDF -- or classes and instances in OOP -- will > surely be led astray by yet another completely different re-use of > terminology that only _seems_ familiar. Repurposing these well-worn >terms > actually gets in the way of understanding. > >4. Move anything about materializing additional triples as a >pre-processing > step -- even sub-class relationships -- into a separate document >specifically > for implementation advice, such as a primer. In other words, split out >all > references to inferencing from the SHACL language itself. To keep the >language > specification clear, an immutable data graph need only be validated >against an > immutable shape graph, full stop. Anything else can be moved >elsewhere. > >5. Move Sections 6 through 11 into a separate document or primer. Far >better > to put this into its own shorter, focused specification than tack it >onto > specification that is already much too long -- 108 pages, had I >printed it out. > >Simpler, clearer specs stand a correspondingly greater chance of actually >being >read -- and used. > >Tom > >[1] >https://www.w3.org/blog/SW/2013/10/04/w3c-workshop-report-rdf-validation-p >ractical-assurances-for-quality-rdf-data/ >[2] http://dublincore.org/documents/dc-dsp/ >[3] https://www.w3.org/TR/rdf-schema/#ch_classes >[4] https://lists.w3.org/Archives/Public/public-rdf-shapes/ >[5] https://lists.w3.org/Archives/Public/public-data-shapes-wg/ > >-- >Tom Baker <tom@tombaker.org> >
Received on Sunday, 1 May 2016 16:00:08 UTC