Re: Analysis of Example in ShEx paper submitted to SWJ from Jose Emilio Labra Gayo on 2015-12-31 (public-data-shapes-wg@w3.org from December 2015)

From: Jose Emilio Labra Gayo <jelabra@gmail.com>
Date: Thu, 31 Dec 2015 07:49:44 +0100
To: "Peter F. Patel-Schneider" <pfpschneider@gmail.com>
Cc: RDF Data Shapes Working Group <public-data-shapes-wg@w3.org>
Message-ID: <CAJadXXJpbzrz5asmN4Kf=mPSoqNqe=S4G9ZeeXvUt80EP3K-7g@mail.gmail.com>
On Mon, Dec 28, 2015 at 6:05 PM, Peter F. Patel-Schneider <
pfpschneider@gmail.com> wrote:

> I took a look at "Validating and Describing Linked Data Portals using
> Shapes", as submitted to the Semantic Web Journal in early December.
> The current version of the submitted paper is currently available at
> www.semantic-web-journal.net/system/files/swj1260.pdf but this version has
> unknown differences from the version that I looked at.
>
> The submission extensively uses an example about measuring the World Wide
> Web's contribution to global development and human rights.  This example
> comes from a previous paper by J. E. L. Gayo, H. Farham, J. C. Fernández,
> and J. M. Á. Rodríguez, "Representing statistical indexes as linked data
> including metadata about their computation process".  The ShEx provided in
> the submission for the example has some significant unexplained differences
> from the example in the published paper.
>

The differences were introduced to better explain some features from ShEx.
The paper uses the WebIndex data as an use case to introduce those features
to the reader. The paper is self-contained in that sense because the
problem statement is described using the figure 2 diagram and the ShEx
definitions from section 3.

I was unable to determine the exact details of the example as there is no
> definition of the the formalism used for the bulk of information about the
> example - Figure 2 in the submission.  Here is my reconstruction of the
> data
> model in Figure 2 plus the suborganization relationship and a little bit
> more from the earlier paper.


The details are given in section 3 using ShEx.

>From this email and another private email you sent me with your review, I
guess that one misunderstanding is that you considered this paper as a
comparison between ShEx and SHACL, while the paper was not written for that
purpose in mind.

As you can read in the conclusions: "In general we consider that the
benefits of validation using either ShEx or SHACL can help the adoption of
RDF based solutions where the quality of data is an important issue."

The purpose of the paper is to show that both ShEx and SHACL can be used to
validate linked data portals.

The paper introduces the problem statement in an informal way in section 2,
then, it describes the dataset using ShEx in section 3 showing that a
linked data portal can be described in ShEx. Later on, it shows how those
definitions can be defined in SHACL and proposes that dataset as a
benchmark.


> I am using a ShEx-like syntax to capture the
> something like the form of the example, but this isn't necessarily ShEx,
> just a syntax to show the data model for the example.
>
[...]


> country {
>   rdf:type ( wf:Country ) [1,1],
>   wf:iso2 xsd:string [1,1],
>   wf:iso3 xsd:string [1,1],
>   rdf:label xsd:string [1,1] }
>

Notice that in the paper we omitted the "rdf:type" declaration. Although
that declaration was in the original data model, we thought that it was
better to omit it in the new paper. The reason is precisely to show that we
can model data models which don't depend on "rdf:type" declarations.

The paper explains that as:

"It should be noted that rdf:type may or may not be included in shape
definitions. In the above example, we deliberately omitted the any rdf:type
requirement declaration, meaning that, in order to satisfy the :Country
shape, a node need only have those properties."


> The actual task to be performed is not described in the submission. It
>
appears to me that the natural task to be done is to determine whether an
> RDF graph containing information about observations conforms to this data
> model, for some definition of conforms.
>

The task to be performed can be guessed from the context of the paper.

>
> This determination could be done in a number of ways in SHACL.  The
> approach
> taken in the submission is to use a set of mutually recursive SHACL shapes.
> However, it seems to me that it would be better to instead use
> non-recursive
> SHACL shapes with scopes as follows:
>

[...]

> The significant difference between the treatment here and the treatment in
> the submission is to use the type information as scopes, so that the shape
> of
> portions of the data is not mandated from its position as a value for some
> other portion of the data but is instead mandated by its type.


Yes, that's the most significant difference and that's why we omitted the
mandatory "rdf:type" declaration in the country shape. While having
"rdf:type" declarations in linked data portals for every node is probably a
good practice, it is not mandatory and validating linked data portals
should not depend on those declarations.

In principle, a node in an RDF graph can have zero, one or more "rdf:type"
declarations, and the validation tool should be able to handle those
situations.


> The point here is mostly to show that a major example of recursive shapes
> does not appear to need recursive shapes, nor even shapes referring to
> other shapes at all.
>

What you have shown is that if every node has a discriminating "rdf:type"
declaration, then the validation can be done easily and without recursive
shapes by referring to the corresponding type instead of the shape.


> peter
>
>
>
>


-- 
-- Jose Labra
Received on Thursday, 31 December 2015 06:50:32 UTC