Re: Comments on SHACL Editors Draft of 29 April from Irene Polikoff on 2016-05-01 (public-rdf-shapes@w3.org from May 2016)

From: Irene Polikoff <irene@topquadrant.com>
Date: Sun, 01 May 2016 11:59:30 -0400
To: Thomas Baker <tom@tombaker.org>, RDF Shapes <public-rdf-shapes@w3.org>
Message-ID: <D34B9CD6.9C94F%irene@topquadrant.com>
[    "Class-based scopes define the scope as the set of all instances of a
    class."

  Okay, yes... classes have extensions... after all, RDF Schema 1.1 says
that
  "Associated with each class is a set, called the class extension of the
  class, which is the set of the instances of the class" [3].  But what
does
  this have to do with defining the set of focus nodes for a shape?  The
scope
  of a shape is _not_ a specific data graph but the set of all instances
of a
  class in the world]

This could be fixed by saying something like:

    "Class-based scopes define the scope as the set of all instances of a
    Class present in the data graph."

However, fundamentally, SHACL operates under the closed world assumption.
So, it could never concern itself with whatever exists or doesn¹t exist in
the world in general - it is always only about the information that has
been been made available/submitted to a SHACL engine. Negation as failure
and so on.


Perhaps, it is worth making a point of explaining this somewhere early on
and then, there is no need to constantly clarify points like this
everywhere in the document. If a reader approaches the spec with the OWA
set of mind, they are bound to make wrong conclusions or become confused.
Thus, it is important that a reader puts themselves into the CWA set of
mind.


Irene Polikoff






On 5/1/16, 10:40 AM, "Thomas Baker" <tom@tombaker.org> wrote:

>Comments on 
>
>Shapes Constraint Language (SHACL)
>Editors Draft 29 April 2016
>http://w3c.github.io/data-shapes/shacl/
>
>Some context: I have followed this activity since participating in the
>workshop
>on RDF validation in 2013 [1].  The activity seemed like it might achieve
>the
>goals pursued a decade ago with the DCMI Working Draft, Description Set
>Profile
>Constraint Language [2].  I have tried to keep up with the excellent work
>by
>Karen Coyle, Antoine Isaac, Hugo Manguinhas, Thomas Hartmann, and others
>on
>comparing the emerging SHACL specification to requirements that have
>accumulated over the years in the Dublin Core community.
>
>There is alot to like in SHACL but I must confess that each time I tried
>to
>actually read the specification I found myself getting stuck at the same
>places.  I'd set it aside, assuming that the issues would shake out.  Many
>months later, however, I find the same sticking points, unchanged.  This
>time I
>pressed on through the introduction to Section 2.1.
>
>These comments convey my thoughts while reading the text and end with some
>suggestions.  I have made no effort to catch up on discussion in the
>relevant
>mailing lists [4,5], so please forgive me if I simply cover issues here
>that
>are already well-understood.
>
>Abstract
>
>  First sentence (also first sentence of Introduction):
>  
>    "SHACL is a language for describing and constraining the contents of
>RDF
>    graphs" 
>
>  So I ask myself: If an RDF graph is an immutable set of triples, in what
>  sense can it be "constrained"?  If an RDF graph is a description with a
>  meaning determined by RDF semantics, what does it mean for that
>_description_
>  to be "described"?  Surely SHACL is not meant to somehow limit the
>  RDF-semantic meaning of an RDF graph, which would make no sense, but
>then
>  what does mean "constraining" mean?  Surely the specification of a
>  "constraint language" should start by defining "constraint".
>
>  Further on, one finds that the "constraint language" actually has
>nothing to
>  do with somehow constraining RDF graphs and everything to do with
>describing
>  an instance of the class "shape", which can be used with a process for
>  determining whether a given RDF graph conforms to the set of constraints
>  described in that shape ("validation").  In the Abstract, however,
>validation
>  is mentioned only in passing ("can be used to communicate information
>about
>  data structures...  generate or validate data, or drive user
>interfaces").
>
>  The Abstract concludes with an unsettling reference to the "underlying
>  semantics" of SHACL.  We already have RDF semantics. Will this document
>  define another?
>
>1. Introduction
>
>    "This document defines what it means for an RDF graph... to conform
>to a
>    graph containing SHACL shapes"
>    
>  An improvement over the Abstract.
>
>1.2. SHACL example
>
>    "A shapes graph containing shape definitions and other information
>that can
>    be utilized to determine what validation is to be done"
>
>  The wording is odd.  How about:
>  
>    "A shapes graph, which describes a set of constraints, can be used to
>    determine whether a given data graph conforms to the constraints."
>
>  Up to this point, has the text actually said that SHACL shape graphs are
>  expressed in RDF?  The Document Outline does say that examples are
>expressed
>  in Turtle syntax, which strongly implies RDF.  But that SHACL shape
>graphs
>  are expressed in RDF is actually not obvious for anyone who knows that
>SPARQL
>  also expresses shape-like constructs for matching against RDF data, and
>that
>  SPARQL constructs are not themselves expressed in RDF.
>  
>  (As an aside, readers of RDF 1.1 Turtle will find instances with
>prefixed
>  names in lowercase, whereas in the SHACL spec the prefixed names are in
>  uppercase.  A sentence about the naming conventions used in this
>document
>  could make this explicit.)
>
>  Section 1.2 continues:
>
>    "ex:IssueShape... [has constraints that apply]... to a (transitive)
>    subclass of ex:Issue following rdf:subClassOf triples"
>    
>  Hmm - nothing in the spec has yet hinted that the process of validating
>a
>  data graph against a shape graph will _require_ additional, out-of-band
>  information such as schema definitions.
>
>1.3. Relationship between SHACL and RDF
>
>    "SHACL uses RDF and RDFS vocabulary... and concepts... [but] SHACL
>does not
>    always use this vocabulary or these concepts in exactly the way that
>they
>    are formally defined in RDF and RDFS."
>
>  Hang on, so SHACL does _not_ use RDF/S vocabulary as defined by the
>RDF/S
>  specs??  It is jarring to read this in a W3C rec-track specification.
>How is
>  this not a show-stopper?
>
>  One then learns that SHACL validation is about more than matching an
>  immutable data graph against an immutable shapes graph.  Apparently it
>  involves the prior creation of an _expanded_ data graph through
>selective
>  materialization of inferred triples.
>  
>  The notion of "SHACL processors" having (selectively) to support
>inferencing
>  goes far beyond just defining a vocabulary for describing a shape and a
>  process for evaluating that shape against a data graph.  It implies a
>  software application with SHACL-specific features and an inferencing
>style
>  that is SHACL-specific -- both of which, to my way of thinking, should
>be
>  completely orthogonal to the language specification, which could quite
>  reasonably focus on just the vocabulary and validation algorithm.
>
>  If, as the spec points out, "SHACL implementations may operate on RDF
>graphs
>  that include entailments", couldn't the SHACL spec be helpfully
>simplified by
>  leaving the materialization of inferred triples out of scope entirely
>-- as
>  something done in a pre-processing phase, perhaps according to a few
>  well-known patterns as described in a separate specification?
>
>  The section ends with very puzzling definitions for "subclass", "type",
>and
>  "instance" -- "A node is an instance of a class if one of its types is
>the
>  given class"?? -- but I press on, hoping the next section will bring
>some
>  clarity...
>
>2. Shapes
>
>  The first paragraph says:
>
>    "Shape scopes define the selection criteria"
>
>  but then Figure 1 says:
>
>    "Scope selects focus nodes"
>
>  If a shape is just a graph (or part of a shapes graph), then surely that
>  graph cannot actually perform a action, like "selects", as if executed
>like a
>  Java method.  Figure 1 also talks about filter shapes that "refine" or
>  "eliminate" and constraints that "produce".  Talking about graphs as
>agents
>  is deeply confusing.
>
>    "Class-based scopes define the scope as the set of all instances of a
>    class."
>
>  Okay, yes... classes have extensions... after all, RDF Schema 1.1 says
>that
>  "Associated with each class is a set, called the class extension of the
>  class, which is the set of the instances of the class" [3].  But what
>does
>  this have to do with defining the set of focus nodes for a shape?  The
>scope
>  of a shape is _not_ a specific data graph but the set of all instances
>of a
>  class in the world?
>  
>  I stop reading.
>
>Summary and suggestions
>
>The spec looks quite nice on the surface but the explanation is
>conceptually
>muddled.  Would it not be simpler and clearer to define a SHACL where, to
>paraphrase the 2008 DSP specification [2], "the fundamental usage model
>for a
>[shape] is to examine whether a [data graph] matches the [shape]"?
>Everything
>else could be out of scope.  Some suggestions:
>
>1. Define "constraint" up-front.
>
>2. If a shape is described in RDF, say so early on, then avoid implying
>that a
>   SHACL shape is based on any semantics other than RDF semantics.
>
>3. Come up with better names than 'subclass', 'superclass', 'type', and
>   'instance' for whatever it is that is being described.  Anyone
>familiar with
>   classes and instances in RDF -- or classes and instances in OOP -- will
>   surely be led astray by yet another completely different re-use of
>   terminology that only _seems_ familiar.  Repurposing these well-worn
>terms
>   actually gets in the way of understanding.
>
>4. Move anything about materializing additional triples as a
>pre-processing
>   step -- even sub-class relationships -- into a separate document
>specifically
>   for implementation advice, such as a primer. In other words, split out
>all
>   references to inferencing from the SHACL language itself.  To keep the
>language
>   specification clear, an immutable data graph need only be validated
>against an
>   immutable shape graph, full stop.  Anything else can be moved
>elsewhere.
>
>5. Move Sections 6 through 11 into a separate document or primer.  Far
>better
>   to put this into its own shorter, focused specification than tack it
>onto
>   specification that is already much too long -- 108 pages, had I
>printed it out.
>
>Simpler, clearer specs stand a correspondingly greater chance of actually
>being
>read -- and used.
>
>Tom
>
>[1] 
>https://www.w3.org/blog/SW/2013/10/04/w3c-workshop-report-rdf-validation-p
>ractical-assurances-for-quality-rdf-data/
>[2] http://dublincore.org/documents/dc-dsp/
>[3] https://www.w3.org/TR/rdf-schema/#ch_classes
>[4] https://lists.w3.org/Archives/Public/public-rdf-shapes/
>[5] https://lists.w3.org/Archives/Public/public-data-shapes-wg/
>
>-- 
>Tom Baker <tom@tombaker.org>
>
Received on Sunday, 1 May 2016 16:00:08 UTC