Re: ShEx relation to SPIN/OWL from Jose Emilio Labra Gayo on 2014-07-03 (public-rdf-shapes@w3.org from July 2014)

From: Jose Emilio Labra Gayo <jelabra@gmail.com>
Date: Thu, 3 Jul 2014 11:11:02 +0200
Cc: John Snelson <John.Snelson@marklogic.com>, "public-rdf-shapes@w3.org" <public-rdf-shapes@w3.org>
Message-ID: <CAJadXXLac1p+zhC+dgASTc8W_MvDy7_akQU=S+EiTUKQJhnNzA@mail.gmail.com>
On Thu, Jul 3, 2014 at 9:01 AM, Dimitris Kontokostas <
kontokostas@informatik.uni-leipzig.de>wrote:

>
>
> On Wed, Jul 2, 2014 at 6:42 PM, John Snelson <John.Snelson@marklogic.com>
>  wrote:
>
>> There's a big difference between a declarative validation language like
>> ShEx and a more general purpose language like SPARQL in SPIN to
>> validate. By being declarative and stating the validation intent rather
>> than the validation method, the description is available to be used in
>> many different scenarios.
>>
>
> My point was that in the end SPARQL is the most declarative language for
> RDF
> and an equivalent compact SPARQL notation would be the ideal syntax here.
> This remark is not downgrade the effort done, the current syntax handles
> most common use cases and I'd like to see it standardized.
>

At this moment, ShEx is a declarative language specially tailored for RDF
shape description and validation which has a compact syntax that can be
familiar to users of RelaxNG and Turtle.

The expressiveness of ShEx is lower than SPARQL by design, because SPARQL
is a general query language while ShEx is focused on RDF shapes description
and validatoin.

ShEx expressoins can be converted to SPARQL queries as can be seen in the
demo from Eric Prud'hommeaux, which contains a "toSPARQL" buttom to convert
ShEx expressions to SPARQL queries.

Also, ShEx contains semantic actions which can augment the expressiveness
of ShEx to provide some functionality like RDF transformation or more
specific validation constraints.

However, after this thread I am kind of skeptic on how ShEx will perform on
> big datasets and I'd like to propose DBpedia validation (or any other big
> dataset) as a use case.
>

Yes, I think DBPedia will be a nice use case.

Currently the RDFShape tool [http://rdfshape.weso.es/] contains the
functionality to validate an endpoint or a URI by dereferencing. It is
still work in progress as we are working to improve the performance of the
validation algorithm.

For DBpedia, the ontology consists of a few K axioms in the form of
>  rdfs:domain, range, disjointness & functionality.
> These axioms can be easily captured by the current ShEx status
>
> (See more inline)
>
>
>> As an example, I _could_ write my RDF validation code in Java running
>> against a triple store - but it would be useless in a number of other
>> contexts.
>
>
> You don't have to, you can reuse existing java libraries like ShEx scala
> (Jose must confirm), SPIN or RDFUnit
>

Yes, as Shexcala (http://labra.github.io/ShExcala/) is implemented in
Scala, it can be called from Java as any Jar library.

In fact, there is another project called VaSKOS (
http://vaskos.chemaar.cloudbees.net/) that is employing it to validate SKOS
and is implemented in Java calling Shexcala.

>
>
>> Using a declarative validation language would also allow the
>> description to be used:
>
>
>> 1) As a description of my RDF format.
>>
>
> ShEx can so a better job here but you could compromise with OWL
>

Yes, I have found ShEx to be a very good language to document data portals.
The reason is that it focuses on the shape of the different types of
resources that ara available behind an endpoint. As an example, the
documentation of the WebIndex using ShEx can be found here:
http://weso.github.io/wiDoc/


> 2) To perform streaming validation of RDF on the wire.
>>
>
> In a general case this is not feasible. RDF does not guarantee statement
> order and there are many cases where you need to validate against other
> resources.
> Take Eric's ShEx example, where would you chop the stream? on every
> resource maybe? then each one would not validate separately, in i.e. Issue7
> you'd miss the ranges for users 1,2,6 and issues 2,3,4
> In an ideal case where you control the streaming or you need to validate
> only a single resource with no cross checking this can be done
>

I think this is a difficult use case, although I would not say that it is
impossible. In fact, I have been thinking about it and it could be possible
to apply it in some special cases...however, at this moment, the current
implementation does not handle streaming yet.


>
>
>> 3) To guide an efficient binary compression algorithm.
>>
>
> +1
>
> 4) To validate the RDF in an HTML document containing RDFa markup.
>>
>
> RDFUnit already does this, not sure about SPIN / ShEx
>

I am planning to add this functionality to Shexcala.

 However I do agree that a human readable syntax is vastly preferable to
>> an RDF based syntax, and drawing inspiration from the SPARQL/Turtle
>> syntax is the most obvious starting point for that.
>>
>
Notice also that currently we have 2 syntaxes for ShEx, a compact syntax
and a RDF based syntax. So one could have ShEx expressions stored in RDF
also.

Best regards, Jose Labra
Received on Thursday, 3 July 2014 09:11:50 UTC