Re: Analysis of Example in ShEx paper submitted to SWJ from Peter F. Patel-Schneider on 2016-01-04 (public-data-shapes-wg@w3.org from January 2016)

From: Peter F. Patel-Schneider <pfpschneider@gmail.com>
Date: Mon, 4 Jan 2016 14:44:49 -0800
To: Jose Emilio Labra Gayo <jelabra@gmail.com>
Cc: Eric Prud'hommeaux <eric@w3.org>, RDF Data Shapes Working Group <public-data-shapes-wg@w3.org>
Message-ID: <568AF5E1.3090808@gmail.com>
On 01/01/2016 11:41 PM, Jose Emilio Labra Gayo wrote:
>     >
>     > 1 a little explanatory text to the effect of "The original
>     >   representation of the web index included type arcs on every
>     >   node. This is not the case for RDF data in general so we are
>     >   modifying the use case to illustrate how validation occurs without
>     >   discriminating type arcs."
>     >
>     > 2 abandon the web index use case and cook up something much less
>     >   documented.
>     >
>     > IMO, 1 seems much more satisfactory to readers in general.
> 
>     I disagree, particularly given the thrust of the submission.  
> 
> 
> Maybe you are trying to impose a thrust to the submission that is not the one
> that the authors intended to be.

Of course this is entirely possible, as all I have to go on is the submission.

>     Given that the
>     paper appears to be about how suitable ShEx is for linked data portals, the
>     ideal would be to show this with actual use cases.  
> 
> 
> The paper is not "just" about how suitable ShEx is for linked data portals:
> the paper describes a linked data portal using ShEx, 

One of my major concerns with the submission is that there are
inadequately-explained differences between what is described in the submission
that I reviewed and previous publications about the portal.

> talks about how it can be
> used to validate it with some tools, 

Another of my concerns is that the ShEx shapes in the submission do not
capture what is shown for the portal in the submission.

> describes the same data model in SHACL,
> proposes a tool that can generate that data model on demand as a benchmarking
> tool and concludes that "the benefits of validation using either ShEx or SHACL
> can help the adoption of RDF based solutions where the quality of data is an
> important issue."
> 
>     Features of ShEx that go
>     beyond the actual use cases could be covered in a separate section.  
> 
> 
> The features have been included because they fit into the use case that is
> being described. 

I do not think that this is the case.  As I stated above, the ShEx shapes in
the submission do not match the use case as described in the submission
Although the description of the use case is informal and incomplete, several
parts of it that appear to work the same are handled with ShEx shapes which
quite different behaviour.  Some of differences involve features that only
show up on one place.  These features thus do not appear to fit into the use
case that is being described.

> We have already a section devoted to "Advanced features"
> where we talk about other features that didn't fit so well into that use case
> or whose introduction was considered less important.
>  
> 
>     If
>     something other than actual use cases is being employed it seems to me to be
>     better to make up something and show that this something is illustrative of
>     actual use cases.
> 
>     Using a modified use case just looks like the modifications were added only
>     because they can be handled by the technology at hand.
> 
> 
> All the modifications can be justified by that use case. I have already said
> that some modifications were introduced to make the paper more readable and
> less repetitive. Other modifications were introduced because we considered
> them to be important...for example the introduction of CLOSED shapes is
> interesting because at the time that we wrote the original paper it was not so
> clear (at least to me) the distinction between closed and open shapes...in the
> original data model every shape was closed. However in the paper we thought
> that it was more interesting to have open shapes by default and to define one
> of the shapes as closed. Something similar happened to the inclusion of
> disjunction that we didn't use in the original data model because we were not
> sure how to handle it at that time.

I don't understand how you can think that these changes are at all helpful to
the argument that ShEx is good for linked data portals.   Either you are
writing something that handles the WebIndex portal correctly (and if this is
different from what you previously published you have to make a very strong
case that what you have now is better than what you had then) or you are not
(and you have to be very clear that you are only using the WebIndex portal as
a general motivation).

> With regards to the use of "rdf:type" arcs for every node, although in the
> case of countries, they all had "rdf:type" arcs, there were other nodes like
> computations that didn't have such a restriction. For example, you can see
> that the following node doesn't include the "rdf:type" arc:
> 
> http://data.webfoundation.org/webindex/v2013/observation/computed_2009_1386752461095_53574

If there is data that doesn't use rdf:type links, then use that data to
motivate the exclusion of rdf:type links.

> However, as I have already said, we thought that it was better to simplify the
> paper omitting the definitions of the statistical computations.

Simplification is probably reasonable here, but that is not all that is going on.

> 
> Jose Labra

peter
Received on Monday, 4 January 2016 22:45:20 UTC