Re: Analysis of Example in ShEx paper submitted to SWJ from Peter F. Patel-Schneider on 2015-12-31 (public-data-shapes-wg@w3.org from December 2015)

From: Peter F. Patel-Schneider <pfpschneider@gmail.com>
Date: Thu, 31 Dec 2015 05:02:32 -0800
To: Jose Emilio Labra Gayo <jelabra@gmail.com>
Cc: RDF Data Shapes Working Group <public-data-shapes-wg@w3.org>
Message-ID: <56852768.8000902@gmail.com>
So the paper then works something like this:

Here is some sort of an E-R diagram (Figure 2) that somehow describes an
actual linked data use case (although even it is modified from the publication
that describes the actual use case).  Here are some ShEx shapes (Section 3)
that do something different - more disjunction, for example.  Therefore ShEx
is suitable for validating and describing linked data portals.

This doesn't sound very convincing.

peter

PS: Many of the shapes actually do use rdf:type (as "a").  It is just :Country
that has dropped the rdf:type from the previous  paper.


On 12/30/2015 10:49 PM, Jose Emilio Labra Gayo wrote:
> On Mon, Dec 28, 2015 at 6:05 PM, Peter F. Patel-Schneider
> <pfpschneider@gmail.com <mailto:pfpschneider@gmail.com>> wrote:
> 
>     I took a look at "Validating and Describing Linked Data Portals using
>     Shapes", as submitted to the Semantic Web Journal in early December.
>     The current version of the submitted paper is currently available at
>     www.semantic-web-journal.net/system/files/swj1260.pdf
>     <http://www.semantic-web-journal.net/system/files/swj1260.pdf> but this
>     version has
>     unknown differences from the version that I looked at.
> 
>     The submission extensively uses an example about measuring the World Wide
>     Web's contribution to global development and human rights.  This example
>     comes from a previous paper by J. E. L. Gayo, H. Farham, J. C. Fernández,
>     and J. M. Á. Rodríguez, "Representing statistical indexes as linked data
>     including metadata about their computation process".  The ShEx provided in
>     the submission for the example has some significant unexplained differences
>     from the example in the published paper.
> 
> 
> The differences were introduced to better explain some features from ShEx. The
> paper uses the WebIndex data as an use case to introduce those features to the
> reader. The paper is self-contained in that sense because the problem
> statement is described using the figure 2 diagram and the ShEx definitions
> from section 3.
> 
>     I was unable to determine the exact details of the example as there is no
>     definition of the the formalism used for the bulk of information about the
>     example - Figure 2 in the submission.  Here is my reconstruction of the data
>     model in Figure 2 plus the suborganization relationship and a little bit
>     more from the earlier paper.  
> 
> 
> The details are given in section 3 using ShEx.
> 
> From this email and another private email you sent me with your review, I
> guess that one misunderstanding is that you considered this paper as a
> comparison between ShEx and SHACL, while the paper was not written for that
> purpose in mind.
> 
> As you can read in the conclusions: "In general we consider that the benefits
> of validation using either ShEx or SHACL can help the adoption of RDF based
> solutions where the quality of data is an important issue."
> 
> The purpose of the paper is to show that both ShEx and SHACL can be used to
> validate linked data portals.
> 
> The paper introduces the problem statement in an informal way in section 2,
> then, it describes the dataset using ShEx in section 3 showing that a linked
> data portal can be described in ShEx. Later on, it shows how those definitions
> can be defined in SHACL and proposes that dataset as a benchmark.
>  
> 
>     I am using a ShEx-like syntax to capture the
>     something like the form of the example, but this isn't necessarily ShEx,
>     just a syntax to show the data model for the example. 
> 
> [...]
>  
> 
>     country {
>       rdf:type ( wf:Country ) [1,1],
>       wf:iso2 xsd:string [1,1],
>       wf:iso3 xsd:string [1,1],
>       rdf:label xsd:string [1,1] }
> 
> 
> Notice that in the paper we omitted the "rdf:type" declaration. Although that
> declaration was in the original data model, we thought that it was better to
> omit it in the new paper. The reason is precisely to show that we can model
> data models which don't depend on "rdf:type" declarations.
> 
> The paper explains that as:
> 
> "It should be noted that rdf:type may or may not be included in shape
> definitions. In the above example, we deliberately omitted the any rdf:type
> requirement declaration, meaning that, in order to satisfy the :Country shape,
> a node need only have those properties."
> 
> 
>     The actual task to be performed is not described in the submission. It
> 
>     appears to me that the natural task to be done is to determine whether an
>     RDF graph containing information about observations conforms to this data
>     model, for some definition of conforms.
> 
> 
> The task to be performed can be guessed from the context of the paper.
> 
> 
>     This determination could be done in a number of ways in SHACL.  The approach
>     taken in the submission is to use a set of mutually recursive SHACL shapes.
>     However, it seems to me that it would be better to instead use non-recursive
>     SHACL shapes with scopes as follows:
> 
> 
> [...]
> 
>     The significant difference between the treatment here and the treatment in
>     the submission is to use the type information as scopes, so that the shape of
>     portions of the data is not mandated from its position as a value for some
>     other portion of the data but is instead mandated by its type.  
> 
> 
> Yes, that's the most significant difference and that's why we omitted the
> mandatory "rdf:type" declaration in the country shape. While having "rdf:type"
> declarations in linked data portals for every node is probably a good
> practice, it is not mandatory and validating linked data portals should not
> depend on those declarations.
> 
> In principle, a node in an RDF graph can have zero, one or more "rdf:type"
> declarations, and the validation tool should be able to handle those situations.
>  
> 
>     The point here is mostly to show that a major example of recursive shapes
>     does not appear to need recursive shapes, nor even shapes referring to
>     other shapes at all.
> 
> 
> What you have shown is that if every node has a discriminating "rdf:type"
> declaration, then the validation can be done easily and without recursive
> shapes by referring to the corresponding type instead of the shape.
> 
> 
>     peter
> 
> 
> 
> 
> 
> 
> -- 
> -- Jose Labra
>
Received on Thursday, 31 December 2015 13:03:12 UTC