Re: ShEx relation to SPIN/OWL from Arthur Ryman on 2014-07-14 (public-rdf-shapes@w3.org from July 2014)

From: Arthur Ryman <ryman@ca.ibm.com>
Date: Mon, 14 Jul 2014 09:05:28 -0400
To: Holger Knublauch <holger@topquadrant.com>
Cc: public-rdf-shapes@w3.org
Message-ID: <OF026C08BD.7F379A54-ON85257D15.00456170-85257D15.0047EB06@ca.ibm.com>
Holger,

The following comments are not specifically about the merits of ShEx 
versus SPIN even thought that is the subject of this thread. 

I believe we do need a high level constraint checking language that has at 
least the following characteristics:

1. We need a high-level vocabulary for the most common types of 
constraint, e.g. occurrence, domain, range, etc. SPARQL is too low level, 
i.e. you can't tell what an arbitrary SPARQL query is doing. However, 
these high-level constraints SHOULD be given a precise semantics in terms 
of some other well-specified language. I believe SPARQL is the natural 
choice for defining the semantics of high-level constraints.

2. Constraints SHOULD NOT only be attached to vocabularies, ontologies, or 
models. Constraints SHOULD be associated with RDF documents, RDF datasets 
or RDF REST APIs (to define the request/response contract). The acid test 
here is if I can define constraints on a document that is entirely 
composed of terms defined in other vocabularies, e.g. FOAF, Dublin Core, 
OSLC, ....

Does SPIN satisfy these two requirements? The Resource Shape submission 
does. 

Regards, 
___________________________________________________________________________
Arthur Ryman, PhD

Chief Data Officer, Rational
Chief Architect, Portfolio & Strategy Management
Distinguished Engineer | Master Inventor | Academy of Technology

Toronto Lab | +1-905-413-3077 (office) | +1-416-939-5063 (mobile)





From:   Holger Knublauch <holger@topquadrant.com>
To:     public-rdf-shapes@w3.org, 
Date:   07/11/2014 07:25 PM
Subject:        Re: ShEx relation to SPIN/OWL



So what is the conclusion of this thread? Is the working group just 
continuing with the ShEX language (because they have already started 
their own baby)? If I look at the parallel thread on the expressivity I 
really have no idea why SPARQL (with some light-weight vocabulary such 
as SPIN) would not be sufficient and better because of the greater 
flexibility, existing engine implementations etc. SPIN has been in 
production for many years, has an open source API, and "just works". 
SPIN templates with the sp:text syntax are quite trivial to support by a 
pre-processor that produces vanilla SPARQL strings that any engine can 
process.

Why invent another language with yet another cryptic syntax so that 
those people who prefer editing by hand have to do fewer keystrokes? 
SPIN can be edited in Turtle (and JSON-LD etc).

Again: would the goal of this working group have changed if SPIN had 
already been a W3C recommendation and not just a member submission?

I am hoping for a constructive discussion to understand the limitations 
of SPIN for the given use cases.

Thanks,
Holger


On 7/8/2014 9:11, Eric Prud'hommeaux wrote:
> * Holger Knublauch <holger@topquadrant.com> [2014-07-08 07:16+1000]
>> Yes I agree that in some cases it is important to be able to not be
>> too constrained by declarations that are part of a published model.
>> It is also sometimes not desirable to put constraints into an
>> ontology so that the class and property definitions can be reused
>> from OWL DL tools (and spin:constraint would break that).
>>
>> However, that's exactly what named graphs have been invented for.
>> When we use SPIN, we typically put the classes and properties into
>> one file, and create another file that owl:imports the schema and
>> then only contains the spin:constraints and rules (and helper
>> templates). This way, anyone is able to use just the ontology if
>> they prefer to do that. To get the semantics of both, you simply
>> owl:import the SPIN file.
>>
>> I therefore do not agree that the separation of constraints and
>> schema requires a new language - they can still live in triples (or
>> rather: quads if you like). I may be missing something in your line
>> of thought.
> The purpose of the new language was simply to make the schema more
> presentable and editable. During the W3C RDF Validation Workshop, we
> looked at OSCL Resource Shapes, Dublin Core Description Set Profiles
> and SPIN (as in the W3C Member Submission, where spin:constraint
> connects a node in an RDF graph to a SPARQL ASK query [CNST]), OWL
> with closed world and unique name assumption, some SPARQL data
> validation schemes, and some path expressions from Google. The idea of
> a human-facing language had wide appeal during the wrap-up at the end
> of the workshop.
>
> The spl:ObjectCountPropertyConstraint construct that you describe
> below moves SPIN closer to the "syntax" (i.e. graph topology) of
> Resource Shapes and Description Set Profile:
>
>    SPIN             Resource Shapes    Description Set Profile
>    arg:property    oslc:property       dsp:Property
>    arg:minCount    oslc:occurs¹        dsp:minOccurs
>    arg:maxCount    oslc:occurs¹        dsp:maxOccurs
>
> ¹ single terms stand for (0,1), (0,n), (1,1), (1,n) cardinalities
>
>
> I think Arthur's point about separating schema from data was just
> that, if you want re-use of data, you can't attach your structural
> constraints to the types of the nodes. We don't want everyone who uses
> a foaf:Person to have to follow the same rules about whether or not
> their application permits/requires a givenName, an mbox, etc. Nor do
> we want it that a node can only serve one purpose, e.g. that some node
> can't act as both a User and an Employee [UEMP].
>
>
> [CNST] 
http://www.w3.org/Submission/2011/SUBM-spin-modeling-20110222/#spin-constraints

> [UEMP] http://www.w3.org/brief/Mzc3
>
>
>> HTH
>> Holger
>>
>>
>>
>> On 7/7/14, 10:45 PM, Arthur Ryman wrote:
>>> Holger,
>>>
>>> Thx. I agree that the example SPIN syntax is high-level enough, like 
OWL
>>> or RDFS. However, given a common, high-level constraint, we need a
>>> standard name and definition for it so other tools don't have to
>>> understand SPARQL. SPIN can give it a semantics in terms of SPARQL.
>>>
>>> There is one aspect of SPIN that I believe is a problem. You wrote:
>>>
>>> This information "belongs" into the
>>> ontology together with the class and property definitions, and should 
be
>>> shared as such.
>>>
>>> That agrees with my understanding of SPIN, but I think it's a problem. 
I
>>> agree that in some cases you do want to associate constraints with a 
class
>>> and in that case the constraints should be part of the ontology. 
However,
>>> there are other use cases where we design RDF documents using existing
>>> vocabularies. There are many vocabularies in common use and it is a 
best
>>> practice to reuse their terms instead of defining new synonyms. For
>>> example, Dublin Core is widely reused and has very few constraints. In
>>> fact, the fewer constraints in a vocabulary, the more reusable it is. 
You
>>> might want to constrain how Dublin Core is used in a particular case.
>>>
>>> Consider the case where you define a REST service that accepts RDF
>>> requests, and the documents use terms from several existing 
vocabularies.
>>> We need a way to describe the document and the constraints it must
>>> satisfy. For example, the REST service may require that the document
>>> contains exactly one dcterms:title property. Of course, the 
constraints
>>> must be compatible with the existing vocabularies, i.e. the 
constraints
>>> must not change the semantics of any term. Furthermore, the 
constraints
>>> may vary depending on the REST operation.
>>>
>>> We need to decouple the constraint language from the ontology or
>>> vocabulary. There is an analogy with natural language. In natural
>>> languages, a dictionary defines the meaning of terms and a grammar 
says
>>> how the terms may be combined into text. In RDF, a vocabulary or 
ontology
>>> is like a dictionary, and a shape language is like a grammar. We can't 
put
>>> everything in the ontology.
>>>
>>> Regards,
>>> 
___________________________________________________________________________
>>> Arthur Ryman, PhD
>>>
>>> Chief Data Officer, Rational
>>> Chief Architect, Portfolio & Strategy Management
>>> Distinguished Engineer | Master Inventor | Academy of Technology
>>>
>>> Toronto Lab | +1-905-413-3077 (office) | +1-416-939-5063 (mobile)
>>>
>>>
>>>
>>>
>>>
>>> From:   Holger Knublauch <holger@topquadrant.com>
>>> To:     public-rdf-shapes@w3.org,
>>> Date:   07/07/2014 01:02 AM
>>> Subject:        Re: ShEx relation to SPIN/OWL
>>>
>>>
>>>
>>> On 7/4/2014 0:58, Arthur Ryman wrote:
>>>> Ideally, we'd have a high-level, human-friendly language for
>>> constraining
>>>> RDF and it would be easily translatable to SPARQL to make its 
semantics
>>>> clear and to give us a standard way to automatically check the
>>>> constraints. The high-level language should have an RDF 
representation,
>>>> but could also have other serializations that are easier to author.
>>> IMHO,
>>>> Turtle is easy to author, but other syntaxes could be more compact. A
>>>> complete solution would also provide a way to drop down into SPARQL
>>> syntax
>>>> so you can describe arbitrary constraints.
>>> I agree, and believe SPIN ticks most of these boxes already. When you
>>> instantiate SPIN templates you basically end up with the same syntax 
as
>>> the RDF representation of ShEX. And assuming that a standard library 
of
>>> templates exists (including templates to represent cardinality and 
other
>>> frequently needed concepts) then SPIN is also a *declarative* model 
that
>>> can be queried without even knowing SPARQL. The declarative triples of
>>> the SPIN template calls can be queried just like any other ontological
>>> data to drive user interfaces (selection of widgets) etc.
>>>
>>> Just to re-iterate here is how an example SPIN template call looks 
like
>>> in Turtle
>>>
>>> :Issue
>>>     spin:constraint [
>>>         rdf:type spl:ObjectCountPropertyConstraint ;
>>>         arg:maxCount 1 ;
>>>         arg:property :reportedBy ;
>>>       ] ;
>>>
>>> It would be trivial to define a template just for that use case, and
>>> reformat it so that it becomes
>>>
>>> :Issue
>>>     spin:constraint [ a ex:Maximum-one ;  ex:property :reportedBy  ] ;
>>>
>>> which is hopefully easy enough to edit. In the snippet above,
>>> ex:Exactly-one would be a SPIN template that defines the actual SPARQL
>>> query to execute via spin:body. I believe this mechanism combines the
>>> flexibility of SPARQL with the simple declarative syntax of RDF 
triples.
>>> SPIN is self-describing and therefore extensible to any constraint
>>> patterns needed in the future, and it grows with the evolution and
>>> adoption of SPARQL.
>>>
>>> I do wonder what this working group would do if SPIN had already been 
a
>>> W3C standard (and not just a member submission). Would this have 
changed
>>> anything?
>>>
>>> FWIW I would strongly recommend to encourage an RDF-based 
representation
>>> over yet another custom syntax. This information "belongs" into the
>>> ontology together with the class and property definitions, and should 
be
>>> shared as such. Other languages also require new parsers and just add 
up
>>> to the learning curve. A good example of what *not* to do with a W3C
>>> standard has been the OWL/XML syntax introduced in OWL 2 - it only 
leads
>>> to fragmentation and additional implementation costs (even Jena 
doesn't
>>> support OWL/XML).
>>>
>>> Holger
>>>
>>>
>>>
>>>
>>
Received on Monday, 14 July 2014 13:06:47 UTC