W3C home > Mailing lists > Public > public-rdf-shapes@w3.org > July 2014

Re: ShEx relation to SPIN/OWL

From: Dimitris Kontokostas <kontokostas@informatik.uni-leipzig.de>
Date: Thu, 3 Jul 2014 10:01:46 +0300
Message-ID: <CA+u4+a1A3oiNphVL9UprSRDWK5jVhT1_n8cEw_H_k9AKR=EjWw@mail.gmail.com>
To: John Snelson <John.Snelson@marklogic.com>
Cc: "public-rdf-shapes@w3.org" <public-rdf-shapes@w3.org>
On Wed, Jul 2, 2014 at 6:42 PM, John Snelson <John.Snelson@marklogic.com>
wrote:

> There's a big difference between a declarative validation language like
> ShEx and a more general purpose language like SPARQL in SPIN to
> validate. By being declarative and stating the validation intent rather
> than the validation method, the description is available to be used in
> many different scenarios.
>

My point was that in the end SPARQL is the most declarative language for
RDF
and an equivalent compact SPARQL notation would be the ideal syntax here.
This remark is not downgrade the effort done, the current syntax handles
most common use cases and I'd like to see it standardized.

However, after this thread I am kind of skeptic on how ShEx will perform on
big datasets and I'd like to propose DBpedia validation (or any other big
dataset) as a use case.

For DBpedia, the ontology consists of a few K axioms in the form of
 rdfs:domain, range, disjointness & functionality.
These axioms can be easily captured by the current ShEx status

(See more inline)


> As an example, I _could_ write my RDF validation code in Java running
> against a triple store - but it would be useless in a number of other
> contexts.


You don't have to, you can reuse existing java libraries like ShEx scala
(Jose must confirm), SPIN or RDFUnit


> Using a declarative validation language would also allow the
> description to be used:


> 1) As a description of my RDF format.
>

ShEx can so a better job here but you could compromise with OWL


> 2) To perform streaming validation of RDF on the wire.
>

In a general case this is not feasible. RDF does not guarantee statement
order and there are many cases where you need to validate against other
resources.
Take Eric's ShEx example, where would you chop the stream? on every
resource maybe? then each one would not validate separately, in i.e. Issue7
you'd miss the ranges for users 1,2,6 and issues 2,3,4
In an ideal case where you control the streaming or you need to validate
only a single resource with no cross checking this can be done


> 3) To guide an efficient binary compression algorithm.
>

+1

4) To validate the RDF in an HTML document containing RDFa markup.
>

RDFUnit already does this, not sure about SPIN / ShEx

Finally, for the gray areas between pass/fail you mentioned in your initial
message there was a discussion in [1]

Best,
Dimitris


[1] http://lists.w3.org/Archives/Public/public-rdf-shapes/2014Jun/0009.html


>
> There's great value in a declarative schema language like XML's Relax NG
> over and above something like Schematron, even though Schematron is
> strictly more expressive.
>
> However I do agree that a human readable syntax is vastly preferable to
> an RDF based syntax, and drawing inspiration from the SPARQL/Turtle
> syntax is the most obvious starting point for that.
>
> John
>
> On 02/07/14 15:55, Dimitris Kontokostas wrote:
> > As discussed on & off the list OWL & SPARQL are sufficient for
> > validation in a CWA.
> > The problem with OWL is the different semantics so people have to
> > rewrite - most of the times the same things - it in another format /
> > language such as SPIN / Shex / SPARQL.
> >
> > Some remarks:
> >
> >   * Everything that includes writing RDF manually is not user friendly,
> >     even the Shex / RDF format, however, with the proper interface (e.g.
> >     Tobraid composer) the difference is negligible
> >   * I agree that the SPARQL example is misleading, for example RDFUnit
> >     generates automatically 43 different (SPARQL) test cases for this
> >     specific schema.
> >       o I also think this is the way to go for Shex implementations,
> >         huge SPARQL queries tend to fail / timeout in big graphs
> >   * Normally, the RDF you have already has an owl/rdfs schema thus, part
> >     of those declarations will be defined anyway
> >
> > In most cases reusing existing OWL schemas for validation is enough e.g.
> > foaf already defines the domains, ranges and datatypes for all it's
> > properties
> > What we need in the end is tools that translate OWL to SPARQL - or to
> > something intermediate like Shex or SPIN - to get half of the work done
> >
> > For all other cases we need SPARQL or something that translates to
> SPARQL.
> > With a proper interface, anything could do :) but if I had to write
> > something by hand I'd choose the compact syntax.
> > However, the only problem with "things" that translate to SPARQL is that
> > they do not have the full SPARQL expressiveness, that is all of Shex,
> > SPIN templates and RDFUnit patterns.
> > Thus, there will always be a case where we'll have to write a manual
> > SPARQL query.
> >
> > just my 2 cents,
> >
> > Best,
> > Dimitris
> >
> >
> >
> >
> > On Wed, Jul 2, 2014 at 4:46 AM, Holger Knublauch <holger@topquadrant.com
> > <mailto:holger@topquadrant.com>> wrote:
> >
> >     Hi Eric, John,
> >
> >
> >     On 7/1/2014 20:31, Eric Prud'hommeaux wrote:
> >
> >>     I intended ShEx to be as human readable as possible for the use
> cases
> >>     in question so I take your challenge as a call to compare it to
> >>     equivalent expressions in SPIN/SPARQL and OWL.
> >
> >     I am attaching a SPIN version of your challenge. The main motivation
> >     for doing this is to demonstrate that it is very well possible to
> >     create human-readable representations while having a maximum of
> >     expressivity (all of SPARQL) and being compatible with a language
> >     that many people already know.
> >
> >     To start this off, here is a TopBraid Composer screen rendering of
> >     the spin:constraints defined for the class Issue:
> >
> >
> >
> >     You can see that I am using one SPARQL query and three SPIN template
> >     calls.
> >
> >
> http://composing-the-semantic-web.blogspot.com.au/2009/01/understanding-spin-templates.html
> >
> >     The Turtle source code of such a template call looks like this:
> >
> >     :Issue
> >              spin:constraint  [
> >                  a             spl:ObjectCountPropertyConstraint ;
> >                  arg:maxCount  1 ;
> >                  arg:property  :reportedBy
> >              ] ;
> >
> >     i.e. it is possible to express a constraint on the maximum
> >     cardinality of a property in just 4 triples (same number as an
> >     owl:Restriction would use).
> >
> >     At execution time, these template calls are substituted by their
> >     SPARQL implementation (spin:body). Here is how the constraint
> >     template used above is defined:
> >
> >
> >
> >     You can see that the template's body is doing the real work, and is
> >     perfectly reusable across many ontologies. Anyone can create and
> >     publish their own templates in RDF. A good example of such a library
> is
> >
> >     http://semwebquality.org/mediawiki/index.php?title=SemWebQuality.org
> >
> >     and TopBraid also includes their own libraries including the SPL
> >     namespace shown above and attached.
> >
> >     Here is the constraint checking valid phone numbers:
> >
> >
> >
> >     I copied this regex from the internet so I have no idea whether it
> >     is correct, but you get the idea. Internally, this gets executed
> >     using FILTER regex in SPARQL, but the user only needs to select the
> >     template and then fill in the required arguments (here: the property
> >     and the specific regex string).
> >
> >     The tricky bit of your example is that it requires inferencing to
> >     run before it can find all violations. One inference that I have
> >     implemented here infers the rdf:type of a resource if it uses a
> >     property with an rdfs:domain:
> >
> >     CONSTRUCT {
> >          ?instance a ?domain .
> >     }
> >     WHERE {
> >          ?property rdfs:domain ?domain .
> >          ?instance ?property ?anyValue .
> >          FILTER NOT EXISTS {
> >              ?instance a ?anyType .
> >          } .
> >     }
> >
> >     There are many other ways of achieving the same result, e.g. using
> >     an out-of-the-box OWL or RDFS inference engine, but I wanted to make
> >     the example self-contained so this is represented as a SPIN rule.
> >
> >     To run this yourself, you would need TopBraid Composer Free Edition
> >     4.4.1 and replace the version of spl with the attached one because I
> >     made some changes for (the yet unpublished) 4.5 version. Then run
> >     SPIN inferences so that issue4 gets its rdf:type. Then press the
> >     Refresh and show problems button. Output should be:
> >
> >
> >
> >     For this run above I actually made the tel: value invalid - the
> >     regex didn't complain about it so maybe it really is a valid URL. I
> >     skipped the complication of foaf:Agent which would probably require
> >     another inference rule. Rest assure that it could be represented
> >     with similar ease.
> >
> >     Also note that the example file had an error that November 31 did
> >     not exist, so I have corrected that for this demo.
> >
> >     I am sure it would be possible to fiddle with this example more to
> >     highlight strengths and weaknesses, but I my quick shot was just
> >     meant as an illustration.
> >
> >
> >>     = SPARQL =
> >>
> >>     The ShEx demo also spits out equivalent SPARQL. You can click View
> as
> >>     <SPARQL query> to see the SPARQL that captures the same semantics. I
> >>     think you'll find it rather daunting to imagine using that as a
> >>     publication format.
> >
> >     My personal take on your SPARQL example is that nobody would write
> >     such a query. For readability this should be split into multiple
> >     SPARQL queries. SPIN provides a "natural" framework for doing so, by
> >     introducing the concept of attaching rules and constraints to
> >     classes. You will find that the SPIN file looks much less scary than
> >     the SPARQL in your example.
> >
> >
> >>     = SPIN =
> >>
> >>     Spin can add a *this* keyword to the above SPARQL, which would allow
> >>     you to break out the clauses from the SPARQL query produced above. I
> >>     haven't tested an example of this, but perhaps you could provide one
> >>     and we can see what semantics it covers with what syntax.
> >
> >     Done. I am especially highlighting the importance of SPIN Templates.
> >     There is also a concept of SPIN Functions that allows anyone to
> >     define their own SPARQL functions that encapsulate reusable queries
> >     and produce easier-to-maintain rules and constraints. My arguments
> >     presented in
> >
> >
> http://composing-the-semantic-web.blogspot.com.au/2010/04/where-owl-fails.html
> >
> >     remain valid: it is quite possible to cover most of the
> >     functionality of OWL with SPIN templates, but templates also enable
> >     other medium advanced users to write their own language extensions.
> >     But not everyone will need to do that and they don't even need to
> >     know that SPARQL exists to use template-based SPIN constraints.
> >
> >     Let me finish by saying that I believe it will be easy to change the
> >     requirements and your example challenge so that other frameworks
> >     than SPIN become severely disadvantaged. SPARQL is very expressive,
> >     so anything involving mathematical operations, string manipulation
> >     etc quickly reaches the limits of other languages.
> >
> >     Happy to discuss further,
> >     Holger
> >
> >
> >
> >
> > --
> > Dimitris Kontokostas
> > Department of Computer Science, University of Leipzig
> > Research Group: http://aksw.org
> > Homepage:http://aksw.org/DimitrisKontokostas
>
>


-- 
Dimitris Kontokostas
Department of Computer Science, University of Leipzig
Research Group: http://aksw.org
Homepage:http://aksw.org/DimitrisKontokostas
Received on Thursday, 3 July 2014 07:02:45 UTC

This archive was generated by hypermail 2.4.0 : Friday, 17 January 2020 17:02:39 UTC