Analysis of Example in ShEx paper submitted to SWJ from Peter F. Patel-Schneider on 2015-12-28 (public-data-shapes-wg@w3.org from December 2015)

From: Peter F. Patel-Schneider <pfpschneider@gmail.com>
Date: Mon, 28 Dec 2015 09:05:20 -0800
To: RDF Data Shapes Working Group <public-data-shapes-wg@w3.org>
Message-ID: <56816BD0.1070609@gmail.com>
I took a look at "Validating and Describing Linked Data Portals using
Shapes", as submitted to the Semantic Web Journal in early December.
The current version of the submitted paper is currently available at
www.semantic-web-journal.net/system/files/swj1260.pdf but this version has
unknown differences from the version that I looked at.

The submission extensively uses an example about measuring the World Wide
Web's contribution to global development and human rights.  This example
comes from a previous paper by J. E. L. Gayo, H. Farham, J. C. Fernández,
and J. M. Á. Rodríguez, "Representing statistical indexes as linked data
including metadata about their computation process".  The ShEx provided in
the submission for the example has some significant unexplained differences
from the example in the published paper.


I was unable to determine the exact details of the example as there is no
definition of the the formalism used for the bulk of information about the
example - Figure 2 in the submission.  Here is my reconstruction of the data
model in Figure 2 plus the suborganization relationship and a little bit
more from the earlier paper.  I am using a ShEx-like syntax to capture the
something like the form of the example, but this isn't necessarily ShEx,
just a syntax to show the data model for the example.

dataset {
  rdf:type ( qb:DataSet ) [1,1],
  qb:structure wf:DSD [1,1],
  rdfs:label xsd:string [1,1],
  dct:publisher @organization [1,1],
  qb:slice @ slice [1,*] }

slice {
  rdf:type ( qb:Slice ) [1,1],
  qb:sliceStructure wf:sliceByArea [1,1],
  qb:observation @ observation [1,*],
  cex:indicator @ indicator [1,1] }

organization {
  rdf:type ( org:Organization ) [1,1],
  rdfs:label xsd:string [1,1],
  foaf:homepage URI [1,1],
  org:hasSubOrganization @ organization [0,*] }

observation {
  rdf:type ( qb:Observation ) [1,1],
  cex:value xsd:float [1,1],
  dcterms:issued xsd:dateTime [1,1],
  rdfs:label xsd:string [1,1],
  cex:ref-year xsd:gyear [1,1],
  cex:ref-area @country [1,1],
  cex:indicator @indicator [1,1],
  cex:computation @computation }

indicator {
  rdf:type ( cex:Primary cex:Secondary ) [1,1],
  rdfs:label xsd:string [1,1],
  rdfs:comment xsd:string [1,1],
  skos:notation xsd:string [1,1],
  wf:provider @organization [1,1] }

country {
  rdf:type ( wf:Country ) [1,1],
  wf:iso2 xsd:string [1,1],
  wf:iso3 xsd:string [1,1],
  rdf:label xsd:string [1,1] }

computation {
  rdf:type ( cex:Computation ) [1,1] }


The actual task to be performed is not described in the submission.  It
appears to me that the natural task to be done is to determine whether an
RDF graph containing information about observations conforms to this data
model, for some definition of conforms.

This determination could be done in a number of ways in SHACL.  The approach
taken in the submission is to use a set of mutually recursive SHACL shapes.
However, it seems to me that it would be better to instead use non-recursive
SHACL shapes with scopes as follows:

dataset sh:scopeClass qb:Dataset ;
  sh:property [ sh:predicate qb:structure; sh:class wf:DSD ;
                sh:minCount 1 ; sh:maxCount 1 ] ;
  sh:property [ sh:predicate rdfs:label; sh:datatype xsd:string ;
                sh:minCount 1 ; sh:maxCount 1 ] ;
  sh:property [ sh:predicate dct:publisher; sh:class xsd:string ;
                sh:minCount 1 ; sh:maxCount 1 ] ;
  sh:property [ sh:predicate qb:slice; sh:class qb:Slice ;
                sh:minCount 1 ] .

slice sh:scopeClass qb:Slice ;
  sh:property [ sh:predicate qb:sliceStructure; sh:class wf:sliceByArea ;
                sh:minCount 1 ; sh:maxCount 1 ] ;
  sh:property [ sh:predicate qb:observation; sh:class qb:Observation ;
                sh:minCount 1 ] ;
  sh:constraint [ a sh:OrConstraint ;
   sh:shapes ( [ sh:property [ sh:predicate cex:indicator; sh:class cex:Primary ;
                               sh:minCount 1 ; sh:maxCount 1 ] ]
               [ sh:property [ sh:predicate cex:indicator; sh:class
cex:Secondary ;
                               sh:minCount 1 ; sh:maxCount 1 ] ] ) .

organization sh:scopeClass org:Organization ;
  sh:property [ sh:predicate rdfs:label; sh:datatype xsd:string ;
                sh:minCount 1 ; sh:maxCount 1 ] ;
  sh:property [ sh:predicate foaf:homepage; sh:nodeKind sh:IRI ;
                sh:minCount 1 ; sh:maxCount 1 ] ;
  sh:property [ sh:predicate org:hasSubOrganization; sh:class org:Organization ] .

observation sh:scopeClass qb:Observation ;
  sh:property [ sh:predicate cex:value; sh:datatype xsd:float ;
                sh:minCount 1 ; sh:maxCount 1 ] ;
  sh:property [ sh:predicate dcterms:issued; sh:datatype xsd:dateTime ;
                sh:minCount 1 ; sh:maxCount 1 ] ;
  sh:property [ sh:predicate rdfs:label; sh:datatype xsd:string ;
                sh:minCount 1 ; sh:maxCount 1 ] ;
  sh:property [ sh:predicate cex:ref-year; sh:datatype xsd:gyear ;
                sh:minCount 1 ; sh:maxCount 1 ] ;
  sh:property [ sh:predicate cex:ref-area; sh:class wf:Country ;
                sh:minCount 1 ; sh:maxCount 1 ] ;
  sh:constraint [ a sh:OrConstraint ;
   sh:shapes ( [ sh:property [ sh:predicate cex:indicator; sh:class cex:Primary ;
                               sh:minCount 1 ; sh:maxCount 1 ] ]
               [ sh:property [ sh:predicate cex:indicator; sh:class
cex:Secondary ;
                               sh:minCount 1 ; sh:maxCount 1 ] ] ) .
  sh:property [ sh:predicate cex:computation; sh:class cex:Computation ;
                sh:minCount 1 ; sh:maxCount 1 ] .

indicator sh:scopeClass cex:Primary ;
          sh:scopeClass cex:Secondary ;
  sh:property [ sh:predicate rdfs:label; sh:datatype xsd:string ;
                sh:minCount 1 ; sh:maxCount 1 ] ;
  sh:property [ sh:predicate rdfs:comment; sh:datatype xsd:string ;
                sh:minCount 1 ; sh:maxCount 1 ] ;
  sh:property [ sh:predicate skos:notation; sh:datatype xsd:string ;
                sh:minCount 1 ; sh:maxCount 1 ] ;
  sh:property [ sh:predicate wf:provider; sh:class org:Organization ;
                sh:minCount 1 ; sh:maxCount 1 ] .

country sh:scopeClass wf:Country ;
  sh:property [ sh:predicate wf:iso2; sh:datatype xsd:string ;
                sh:minCount 1 ; sh:maxCount 1 ] ;
  sh:property [ sh:predicate wf:iso3; sh:datatype xsd:string ;
                sh:minCount 1 ; sh:maxCount 1 ] ;
  sh:property [ sh:predicate rdfs:label; sh:datatype xsd:string ;
                sh:minCount 1 ; sh:maxCount 1 ] .


The significant difference between the treatment here and the treatment in
the submission is to use the type information as scopes, so that the shape of
portions of the data is not mandated from its position as a value for some
other portion of the data but is instead mandated by its type.  This results
in a difference of behaviour but I think that this SHACL encoding better
matches the use here than the ShEx encoding does.


The point here is mostly to show that a major example of recursive shapes
does not appear to need recursive shapes, nor even shapes referring to other
shapes at all.

peter
Received on Monday, 28 December 2015 17:05:49 UTC