Re: Comments on SHACL Editors Draft of 29 April

[    "Class-based scopes define the scope as the set of all instances of a
    class."

  Okay, yes... classes have extensions... after all, RDF Schema 1.1 says
that
  "Associated with each class is a set, called the class extension of the
  class, which is the set of the instances of the class" [3].  But what
does
  this have to do with defining the set of focus nodes for a shape?  The
scope
  of a shape is _not_ a specific data graph but the set of all instances
of a
  class in the world]

This could be fixed by saying something like:

    "Class-based scopes define the scope as the set of all instances of a
    Class present in the data graph."

However, fundamentally, SHACL operates under the closed world assumption.
So, it could never concern itself with whatever exists or doesnąt exist in
the world in general - it is always only about the information that has
been been made available/submitted to a SHACL engine. Negation as failure
and so on.


Perhaps, it is worth making a point of explaining this somewhere early on
and then, there is no need to constantly clarify points like this
everywhere in the document. If a reader approaches the spec with the OWA
set of mind, they are bound to make wrong conclusions or become confused.
Thus, it is important that a reader puts themselves into the CWA set of
mind.


Irene Polikoff






On 5/1/16, 10:40 AM, "Thomas Baker" <tom@tombaker.org> wrote:

>Comments on 
>
>Shapes Constraint Language (SHACL)
>Editors Draft 29 April 2016
>http://w3c.github.io/data-shapes/shacl/
>
>Some context: I have followed this activity since participating in the
>workshop
>on RDF validation in 2013 [1].  The activity seemed like it might achieve
>the
>goals pursued a decade ago with the DCMI Working Draft, Description Set
>Profile
>Constraint Language [2].  I have tried to keep up with the excellent work
>by
>Karen Coyle, Antoine Isaac, Hugo Manguinhas, Thomas Hartmann, and others
>on
>comparing the emerging SHACL specification to requirements that have
>accumulated over the years in the Dublin Core community.
>
>There is alot to like in SHACL but I must confess that each time I tried
>to
>actually read the specification I found myself getting stuck at the same
>places.  I'd set it aside, assuming that the issues would shake out.  Many
>months later, however, I find the same sticking points, unchanged.  This
>time I
>pressed on through the introduction to Section 2.1.
>
>These comments convey my thoughts while reading the text and end with some
>suggestions.  I have made no effort to catch up on discussion in the
>relevant
>mailing lists [4,5], so please forgive me if I simply cover issues here
>that
>are already well-understood.
>
>Abstract
>
>  First sentence (also first sentence of Introduction):
>  
>    "SHACL is a language for describing and constraining the contents of
>RDF
>    graphs" 
>
>  So I ask myself: If an RDF graph is an immutable set of triples, in what
>  sense can it be "constrained"?  If an RDF graph is a description with a
>  meaning determined by RDF semantics, what does it mean for that
>_description_
>  to be "described"?  Surely SHACL is not meant to somehow limit the
>  RDF-semantic meaning of an RDF graph, which would make no sense, but
>then
>  what does mean "constraining" mean?  Surely the specification of a
>  "constraint language" should start by defining "constraint".
>
>  Further on, one finds that the "constraint language" actually has
>nothing to
>  do with somehow constraining RDF graphs and everything to do with
>describing
>  an instance of the class "shape", which can be used with a process for
>  determining whether a given RDF graph conforms to the set of constraints
>  described in that shape ("validation").  In the Abstract, however,
>validation
>  is mentioned only in passing ("can be used to communicate information
>about
>  data structures...  generate or validate data, or drive user
>interfaces").
>
>  The Abstract concludes with an unsettling reference to the "underlying
>  semantics" of SHACL.  We already have RDF semantics. Will this document
>  define another?
>
>1. Introduction
>
>    "This document defines what it means for an RDF graph... to conform
>to a
>    graph containing SHACL shapes"
>    
>  An improvement over the Abstract.
>
>1.2. SHACL example
>
>    "A shapes graph containing shape definitions and other information
>that can
>    be utilized to determine what validation is to be done"
>
>  The wording is odd.  How about:
>  
>    "A shapes graph, which describes a set of constraints, can be used to
>    determine whether a given data graph conforms to the constraints."
>
>  Up to this point, has the text actually said that SHACL shape graphs are
>  expressed in RDF?  The Document Outline does say that examples are
>expressed
>  in Turtle syntax, which strongly implies RDF.  But that SHACL shape
>graphs
>  are expressed in RDF is actually not obvious for anyone who knows that
>SPARQL
>  also expresses shape-like constructs for matching against RDF data, and
>that
>  SPARQL constructs are not themselves expressed in RDF.
>  
>  (As an aside, readers of RDF 1.1 Turtle will find instances with
>prefixed
>  names in lowercase, whereas in the SHACL spec the prefixed names are in
>  uppercase.  A sentence about the naming conventions used in this
>document
>  could make this explicit.)
>
>  Section 1.2 continues:
>
>    "ex:IssueShape... [has constraints that apply]... to a (transitive)
>    subclass of ex:Issue following rdf:subClassOf triples"
>    
>  Hmm - nothing in the spec has yet hinted that the process of validating
>a
>  data graph against a shape graph will _require_ additional, out-of-band
>  information such as schema definitions.
>
>1.3. Relationship between SHACL and RDF
>
>    "SHACL uses RDF and RDFS vocabulary... and concepts... [but] SHACL
>does not
>    always use this vocabulary or these concepts in exactly the way that
>they
>    are formally defined in RDF and RDFS."
>
>  Hang on, so SHACL does _not_ use RDF/S vocabulary as defined by the
>RDF/S
>  specs??  It is jarring to read this in a W3C rec-track specification.
>How is
>  this not a show-stopper?
>
>  One then learns that SHACL validation is about more than matching an
>  immutable data graph against an immutable shapes graph.  Apparently it
>  involves the prior creation of an _expanded_ data graph through
>selective
>  materialization of inferred triples.
>  
>  The notion of "SHACL processors" having (selectively) to support
>inferencing
>  goes far beyond just defining a vocabulary for describing a shape and a
>  process for evaluating that shape against a data graph.  It implies a
>  software application with SHACL-specific features and an inferencing
>style
>  that is SHACL-specific -- both of which, to my way of thinking, should
>be
>  completely orthogonal to the language specification, which could quite
>  reasonably focus on just the vocabulary and validation algorithm.
>
>  If, as the spec points out, "SHACL implementations may operate on RDF
>graphs
>  that include entailments", couldn't the SHACL spec be helpfully
>simplified by
>  leaving the materialization of inferred triples out of scope entirely
>-- as
>  something done in a pre-processing phase, perhaps according to a few
>  well-known patterns as described in a separate specification?
>
>  The section ends with very puzzling definitions for "subclass", "type",
>and
>  "instance" -- "A node is an instance of a class if one of its types is
>the
>  given class"?? -- but I press on, hoping the next section will bring
>some
>  clarity...
>
>2. Shapes
>
>  The first paragraph says:
>
>    "Shape scopes define the selection criteria"
>
>  but then Figure 1 says:
>
>    "Scope selects focus nodes"
>
>  If a shape is just a graph (or part of a shapes graph), then surely that
>  graph cannot actually perform a action, like "selects", as if executed
>like a
>  Java method.  Figure 1 also talks about filter shapes that "refine" or
>  "eliminate" and constraints that "produce".  Talking about graphs as
>agents
>  is deeply confusing.
>
>    "Class-based scopes define the scope as the set of all instances of a
>    class."
>
>  Okay, yes... classes have extensions... after all, RDF Schema 1.1 says
>that
>  "Associated with each class is a set, called the class extension of the
>  class, which is the set of the instances of the class" [3].  But what
>does
>  this have to do with defining the set of focus nodes for a shape?  The
>scope
>  of a shape is _not_ a specific data graph but the set of all instances
>of a
>  class in the world?
>  
>  I stop reading.
>
>Summary and suggestions
>
>The spec looks quite nice on the surface but the explanation is
>conceptually
>muddled.  Would it not be simpler and clearer to define a SHACL where, to
>paraphrase the 2008 DSP specification [2], "the fundamental usage model
>for a
>[shape] is to examine whether a [data graph] matches the [shape]"?
>Everything
>else could be out of scope.  Some suggestions:
>
>1. Define "constraint" up-front.
>
>2. If a shape is described in RDF, say so early on, then avoid implying
>that a
>   SHACL shape is based on any semantics other than RDF semantics.
>
>3. Come up with better names than 'subclass', 'superclass', 'type', and
>   'instance' for whatever it is that is being described.  Anyone
>familiar with
>   classes and instances in RDF -- or classes and instances in OOP -- will
>   surely be led astray by yet another completely different re-use of
>   terminology that only _seems_ familiar.  Repurposing these well-worn
>terms
>   actually gets in the way of understanding.
>
>4. Move anything about materializing additional triples as a
>pre-processing
>   step -- even sub-class relationships -- into a separate document
>specifically
>   for implementation advice, such as a primer. In other words, split out
>all
>   references to inferencing from the SHACL language itself.  To keep the
>language
>   specification clear, an immutable data graph need only be validated
>against an
>   immutable shape graph, full stop.  Anything else can be moved
>elsewhere.
>
>5. Move Sections 6 through 11 into a separate document or primer.  Far
>better
>   to put this into its own shorter, focused specification than tack it
>onto
>   specification that is already much too long -- 108 pages, had I
>printed it out.
>
>Simpler, clearer specs stand a correspondingly greater chance of actually
>being
>read -- and used.
>
>Tom
>
>[1] 
>https://www.w3.org/blog/SW/2013/10/04/w3c-workshop-report-rdf-validation-p
>ractical-assurances-for-quality-rdf-data/
>[2] http://dublincore.org/documents/dc-dsp/
>[3] https://www.w3.org/TR/rdf-schema/#ch_classes
>[4] https://lists.w3.org/Archives/Public/public-rdf-shapes/
>[5] https://lists.w3.org/Archives/Public/public-data-shapes-wg/
>
>-- 
>Tom Baker <tom@tombaker.org>
>

Received on Sunday, 1 May 2016 16:00:08 UTC