Re: Analysis of Example in ShEx paper submitted to SWJ from Jose Emilio Labra Gayo on 2016-01-01 (public-data-shapes-wg@w3.org from January 2016)

From: Jose Emilio Labra Gayo <jelabra@gmail.com>
Date: Fri, 1 Jan 2016 08:58:01 +0100
To: "Peter F. Patel-Schneider" <pfpschneider@gmail.com>, RDF Data Shapes Working Group <public-data-shapes-wg@w3.org>
Message-ID: <CAJadXX+_FmWfbv-TU0onjMU_L1O1UBRzBnotLUAp9RMEOqwuBA@mail.gmail.com>
On Thu, Dec 31, 2015 at 2:02 PM, Peter F. Patel-Schneider <
pfpschneider@gmail.com> wrote:

> So the paper then works something like this:
>
> Here is some sort of an E-R diagram (Figure 2) that somehow describes an
> actual linked data use case (although even it is modified from the
> publication
> that describes the actual use case).  Here are some ShEx shapes (Section 3)
> that do something different - more disjunction, for example.  Therefore
> ShEx
> is suitable for validating and describing linked data portals.
>

Not at all. The paper introduces a real use case using an informal notation
in section 2, then it describes the structure using ShEx notation which can
also be seen as an introduction to ShEx. ShEx was indeed used when we
developed the linked data portal to describe its contents. Sections 4 and 5
describe Shape Expressions tools and how they can be used to validate a
linked data portal. Section 6 is new and describes the same data model
using SHACL. We thought the paper would be useful for readers who wanted to
learn about SHACL use in a real use case. Section 7 describes a tool called
"wiGen" than can generate random instance data on demand based on the
previous defined data model and proposes its use as a performance
benchmarking tool.


> This doesn't sound very convincing.
>

If what doesn't convince you are the modifications done to the original
data model, the reasons for those modifications are:

1.- To make the paper self-contained and easier to read by the target
audience. We simplified some parts of the original data model like the
definitions of the statistical computations because we wanted this paper to
be self-contained and easier to read by people not interested in
statistical computations.
2.- To be as general as possible. We considered that imposing a "rdf:type"
declaration on every node in a linked data portal was too restrictive.
Although those declarations can be a good practice, they are not mandatory
in RDF and linked data validators should not depend on those declarations
to do their job.
3.- To cover some of the features of ShEx in the context of a real use
case. We added some features like closed shapes, disjunction, Extra
modifiers etc. to help a reader understand those features when they are
applied in practice. Our intention was that section 3 of the paper could be
seen both as a description of the data model and as an introduction to ShEx
by example.

PS: Many of the shapes actually do use rdf:type (as "a").  It is just
> :Country
> that has dropped the rdf:type from the previous  paper.
>

Yes, indeed the original data model contained "rdf:type" declarations for
most of the nodes except for some computations. In the paper we decided to
drop "rdf:type" in :Country for two reasons:

1.- Given that :Country is defined as an open shape, we don't prohibit its
appearance, we just omit its definition from the shape meaning that it can
appear or not.
2.- In a later project we noticed that "rdf:type :Country" was too
restrictive for those nodes because we included also regions in the range
of the "cex:ref-area" property.

Jose Labra

>
>
> On 12/30/2015 10:49 PM, Jose Emilio Labra Gayo wrote:
> > On Mon, Dec 28, 2015 at 6:05 PM, Peter F. Patel-Schneider
> > <pfpschneider@gmail.com <mailto:pfpschneider@gmail.com>> wrote:
> >
> >     I took a look at "Validating and Describing Linked Data Portals using
> >     Shapes", as submitted to the Semantic Web Journal in early December.
> >     The current version of the submitted paper is currently available at
> >     www.semantic-web-journal.net/system/files/swj1260.pdf
> >     <http://www.semantic-web-journal.net/system/files/swj1260.pdf> but
> this
> >     version has
> >     unknown differences from the version that I looked at.
> >
> >     The submission extensively uses an example about measuring the World
> Wide
> >     Web's contribution to global development and human rights.  This
> example
> >     comes from a previous paper by J. E. L. Gayo, H. Farham, J. C.
> Fernández,
> >     and J. M. Á. Rodríguez, "Representing statistical indexes as linked
> data
> >     including metadata about their computation process".  The ShEx
> provided in
> >     the submission for the example has some significant unexplained
> differences
> >     from the example in the published paper.
> >
> >
> > The differences were introduced to better explain some features from
> ShEx. The
> > paper uses the WebIndex data as an use case to introduce those features
> to the
> > reader. The paper is self-contained in that sense because the problem
> > statement is described using the figure 2 diagram and the ShEx
> definitions
> > from section 3.
> >
> >     I was unable to determine the exact details of the example as there
> is no
> >     definition of the the formalism used for the bulk of information
> about the
> >     example - Figure 2 in the submission.  Here is my reconstruction of
> the data
> >     model in Figure 2 plus the suborganization relationship and a little
> bit
> >     more from the earlier paper.
> >
> >
> > The details are given in section 3 using ShEx.
> >
> > From this email and another private email you sent me with your review, I
> > guess that one misunderstanding is that you considered this paper as a
> > comparison between ShEx and SHACL, while the paper was not written for
> that
> > purpose in mind.
> >
> > As you can read in the conclusions: "In general we consider that the
> benefits
> > of validation using either ShEx or SHACL can help the adoption of RDF
> based
> > solutions where the quality of data is an important issue."
> >
> > The purpose of the paper is to show that both ShEx and SHACL can be used
> to
> > validate linked data portals.
> >
> > The paper introduces the problem statement in an informal way in section
> 2,
> > then, it describes the dataset using ShEx in section 3 showing that a
> linked
> > data portal can be described in ShEx. Later on, it shows how those
> definitions
> > can be defined in SHACL and proposes that dataset as a benchmark.
> >
> >
> >     I am using a ShEx-like syntax to capture the
> >     something like the form of the example, but this isn't necessarily
> ShEx,
> >     just a syntax to show the data model for the example.
> >
> > [...]
> >
> >
> >     country {
> >       rdf:type ( wf:Country ) [1,1],
> >       wf:iso2 xsd:string [1,1],
> >       wf:iso3 xsd:string [1,1],
> >       rdf:label xsd:string [1,1] }
> >
> >
> > Notice that in the paper we omitted the "rdf:type" declaration. Although
> that
> > declaration was in the original data model, we thought that it was
> better to
> > omit it in the new paper. The reason is precisely to show that we can
> model
> > data models which don't depend on "rdf:type" declarations.
> >
> > The paper explains that as:
> >
> > "It should be noted that rdf:type may or may not be included in shape
> > definitions. In the above example, we deliberately omitted the any
> rdf:type
> > requirement declaration, meaning that, in order to satisfy the :Country
> shape,
> > a node need only have those properties."
> >
> >
> >     The actual task to be performed is not described in the submission.
> It
> >
> >     appears to me that the natural task to be done is to determine
> whether an
> >     RDF graph containing information about observations conforms to this
> data
> >     model, for some definition of conforms.
> >
> >
> > The task to be performed can be guessed from the context of the paper.
> >
> >
> >     This determination could be done in a number of ways in SHACL.  The
> approach
> >     taken in the submission is to use a set of mutually recursive SHACL
> shapes.
> >     However, it seems to me that it would be better to instead use
> non-recursive
> >     SHACL shapes with scopes as follows:
> >
> >
> > [...]
> >
> >     The significant difference between the treatment here and the
> treatment in
> >     the submission is to use the type information as scopes, so that the
> shape of
> >     portions of the data is not mandated from its position as a value
> for some
> >     other portion of the data but is instead mandated by its type.
> >
> >
> > Yes, that's the most significant difference and that's why we omitted the
> > mandatory "rdf:type" declaration in the country shape. While having
> "rdf:type"
> > declarations in linked data portals for every node is probably a good
> > practice, it is not mandatory and validating linked data portals should
> not
> > depend on those declarations.
> >
> > In principle, a node in an RDF graph can have zero, one or more
> "rdf:type"
> > declarations, and the validation tool should be able to handle those
> situations.
> >
> >
> >     The point here is mostly to show that a major example of recursive
> shapes
> >     does not appear to need recursive shapes, nor even shapes referring
> to
> >     other shapes at all.
> >
> >
> > What you have shown is that if every node has a discriminating "rdf:type"
> > declaration, then the validation can be done easily and without recursive
> > shapes by referring to the corresponding type instead of the shape.
> >
> >
> >     peter
> >
> >
> >
> >
> >
> >
> > --
> > -- Jose Labra
> >
>



-- 
-- Jose Labra
Received on Friday, 1 January 2016 07:58:50 UTC