Re: Two Standards ? from Holger Knublauch on 2015-02-14 (public-data-shapes-wg@w3.org from February 2015)

From: Holger Knublauch <holger@topquadrant.com>
Date: Sat, 14 Feb 2015 10:24:49 +1000
To: public-data-shapes-wg@w3.org
Message-ID: <54DE95D1.6090202@topquadrant.com>
Thanks Harold, I think I agree with most of what you are saying. I would 
like to figure out where our thinking differs.

I think most people here agree on the basic structure of how to define 
property constraints, that there needs to be a mechanism for complex 
constraints. It is also OK to use reasoners in conjunction with these 
constraints, assuming that the reasoners produce additional triples that 
can be queried just like any other triples. There is not much to discuss 
about global constraints I think, neither on the individual constraints 
in general. The big open question is about the grouping of constraints 
into "shapes". And here the only open question is about how to represent 
those shapes *in RDF triples* - the ShExC syntax doesn't use triples anyway.

- Option "Shape-centric":

     ex:IssueShape a ldom:Shape ;
         ldom:property [ ... ] .

     ex:MyIssue ldom:nodeShape ex:IssueShape .

- Option "Class-centric":

     ex:IssueShape a rdfs:Class ;
         ldom:property [ ... ] .

     ex:MyIssue rdf:type ex:IssueShape .

- Option "Both" (allow either or, see also ldom:ShapeSelectors).

My concerns are that the "Both" option is unnecessarily complex, and the 
shape-centric option creates a parallel semantic web that only mirrors 
and competes with the current ontology-driven semantic web.

Every ShExC file can be compiled into an equivalent "Shape-centric" RDF 
model. The question is: why would it not work to also translate all 
ShExC files into classes internally - these classes are never seen by 
anyone, but they provide a formal grounding and a back-up execution 
engine in case no native ShExC processor is available. Likewise, the 
linkage of resources to shapes (ldom:nodeShape) does not seem to get 
encoded in ShExC, so why would this topic matter to ShEx people. ShExC 
files are their own data format that only serves as input to the ShEx 
engine, and this makes sense for some applications because it prevents 
unwanted side effects between shapes and the data.

In case we can agree that from a ShExC perspective, an internal mapping 
into Class-centric shapes is acceptable, then the remaining question is 
whether any other use cases require the shape-centric triple syntax. 
 From what Arthur wrote recently, it sounds like the OSLC people would 
have been OK with using (OWL) classes if there had been a closed-world 
standard for that. In the absence of that, they created their own 
"shapes" structure that works very similar to classes. In general, as 
long as classes can play the role of a shape, there is nothing in 
ldom:Shape that could not be covered by classes. If it helps, we could 
easily introduce a class ldom:Shape as subclass of rdfs:Class.

I'd be happy to be convinced otherwise, and I welcome real examples 
where classes *in the RDF syntax* would not work. But in the absence of 
such evidence, I would strongly vote in favor of a solution that is as 
simple as possible, and as compatible to established semantic web 
principles and mainstream terminology as possible. Therefore:

- if a user wants to slice and dice data using Shapes in isolation from 
the ontologies, they should use ShExC
- if a user wants to build or extend ontologies with constraints, they 
should use classes

These are different use cases that deserve different solutions.

Holger



On 2/14/2015 8:06, Solbrig, Harold R. wrote:
> Folks,
>
> Apologies for not catching the call for feedback below.  I very much 
> like Hoger's suggestion, "Another option would be to define a compiler 
> from ShExC into LDOM RDF and back", as it would get us closer to our 
> (Mayo/CIMI's)  primary goal — a formal definition of the 
> /semantics/ of RDF data shapes.  If we can compile back and forth, we 
> are (hopefully) demonstrating semantic equivalence.  The Mayo/CIMI 
> goal is to arrive at:
>
>  1. A consistent set of semantics for the specification of Shape
>     Expressions
>  2. At least one grammar/syntax that can formally represent these
>     semantics
>
> We (again – Mayo/CIMI)  would hope that the grammar meets some of our 
> own goals in terms of succinctness, understandability and the like but 
> we will be able live with whatever comes out as long as it fulfills 
> our semantic / functional requirements.
>
> It is quite likely that we will end up using other representational 
> forms in some of our projects in any case (one of the representations 
> that we have waiting in the wings is UML). While it might be helpful 
> to have community buy in on those other forms, it isn't essential as 
> long as we can demonstrate that there is an isomorphism between our 
> representation and the (or "a")  standard representational form.   I 
> see uses for both ShExC and LDOM RDF and, as long as we can agree that 
> they are (or share) different representations for the same thing, then 
> we will be quite happy.
>
> Arguing about whether ShExC, LDOM RDF or some other representation is 
> the right way to go is, in my mind, kind of like arguing on the syntax 
> of Turtle vs RDF/XML without first agreeing on the underlying model of 
> RDF itself.  The representations are essential, in the sense that it 
> is danged hard to talk about a model without having a succinct grammar 
> to do so, but we need to use a first approximation of some grammar to 
> discuss the model and, only then, to create final specification(s) for 
> various representational forms.
>
> I would propose that we declare at the outset that we want both ShExC 
> and LDOM RDF to be able to represent the same core semantics (I say 
> "core" because I wouldn't object to either or both of them having 
> additional but optional features that go beyond the core 
> specification).  Lets use whatever formalism makes the most sense in a 
> given context to explain what a given constraint should do and, once 
> we've arrived at some sort of consensus, record the decision using 
> formal logic.  A final step would be to adjust the designs of one or 
> both languages so that we know exactly what an expression means and 
> how the two align.
>
> Harold Solbrig
> Mayo Clinic
>
>
> From: Holger Knublauch <holger@topquadrant.com 
> <mailto:holger@topquadrant.com>>
> Date: Friday, February 13, 2015 at 3:30 PM
> To: RDF Data Shapes Working Group <public-data-shapes-wg@w3.org 
> <mailto:public-data-shapes-wg@w3.org>>
> Subject: Re: Two Standards ?
> Resent-From: <public-data-shapes-wg@w3.org 
> <mailto:public-data-shapes-wg@w3.org>>
> Resent-Date: Friday, February 13, 2015 at 3:30 PM
>
> The upcoming F2F meeting is supposed to deliver the general direction, 
> select editors and deliverables [1]. I don't think my proposal here is 
> premature at all. In fact it touches on the very fundamental questions 
> that Peter suggested we discuss too.
>
> Holger
>
> [1] https://www.w3.org/2014/data-shapes/wiki/F2F2#Objectives
>
>
> On 2/14/15 7:03 AM, Michel Dumontier wrote:
>> I think all this discussion premature and counter to the intended 
>> focus of this WG. Stay focused on delivering the promised outcomes.
>>
>> m.
>>
>> Michel Dumontier, PhD
>> Associate Professor of Medicine (Biomedical Informatics)
>> Stanford University
>> http://dumontierlab.com
>>
>> On Fri, Feb 13, 2015 at 12:06 PM, Holger Knublauch 
>> <holger@topquadrant.com <mailto:holger@topquadrant.com>> wrote:
>>
>>     My concern is not about personal preferences, but about
>>     language(s) that end users will actually want to use. We already
>>     struggle to understand shapes versus classes within the WG. The
>>     separation that I propose would allow us to write two different
>>     primers that will be consistent to understand and use.
>>
>>     If the charter does not give us the possibility to define two
>>     standards, then this becomes a matter of packaging. One approach
>>     is to introduce a small Abstract Syntax for the commonality
>>     between LDOM and ShExC. This may include something like the Shape
>>     Selectors, but not in RDF but "abstract". Another option would be
>>     to define a compiler from ShExC into LDOM RDF and back (I had
>>     proposed that before [1] without getting feedback). Both concrete
>>     syntaxes could still have a similar name, if that helps with the
>>     standardization process.
>>
>>     I also assume that WGs are still allowed to slightly diverge from
>>     the original Charter if they justify their reasons for doing so -
>>     at least that is what I was told when we wrote the original
>>     charter. I believe the discussions over the last half year (and
>>     potentially another half a year well into 2015) provide some of
>>     those reasons. Also, producing a Compact Syntax has been
>>     mentioned in the charter.
>>
>>     Holger
>>
>>     [1]
>>     https://lists.w3.org/Archives/Public/public-data-shapes-wg/2015Jan/0223.html
>>
>>
>>
>>
>>
>>     On 2/14/15 5:07 AM, Arnaud Le Hors wrote:
>>>     I don't think there is evidence yet that a common solution can't
>>>     be found. Yesterday's strawpoll tells me there is hope we can
>>>     find some common ground to build on to produce a standard that
>>>     we can all live with. This may not be anyone's personal
>>>     preference but standards are typically not.
>>>
>>>     It may be that eventually some will seek to define other
>>>     standards but this won't happen here. Our charter doesn't give
>>>     us that possibility.
>>>     --
>>>     Arnaud  Le Hors - Senior Technical Staff Member, Open Web
>>>     Technologies - IBM Software Group
>>>
>>>
>>>
>>>
>>>     From: Dean Allemang <dallemang@workingontologist.com>
>>>     <mailto:dallemang@workingontologist.com>
>>>     To: Holger Knublauch <holger@topquadrant.com>
>>>     <mailto:holger@topquadrant.com>
>>>     Cc: RDF Data Shapes Working Group <public-data-shapes-wg@w3.org>
>>>     <mailto:public-data-shapes-wg@w3.org>
>>>     Date: 02/12/2015 08:08 PM
>>>     Subject: Re: Two Standards ?
>>>     Sent by: deanallemang@gmail.com <mailto:deanallemang@gmail.com>
>>>     ------------------------------------------------------------------------
>>>
>>>
>>>
>>>     I have been talking about Shapes with my FIBO colleagues - we
>>>     continue to face the expressivity issues around OWL (role
>>>     intersections and friendly fire seem to come up a lot for us). 
>>>     We are moving in to things like SPIN/SWRL, and/or FIBO-RIF(a
>>>     proposal that I worked on  last July that moves everything into
>>>     a subset of RIF) to solve our expressivity issues.  We are
>>>     currently going to do all of this in Informative Annexes (as
>>>     opposed to normative recommendations), because we don't (yet)
>>>     have a good standard in which to write these things.
>>>
>>>     An expressive shapes language, based on SPARQL, would satisfy
>>>     our group's needs quite well.
>>>
>>>     I wonder a bit about the relationship between the two languages
>>>     that Holger proposes - is it important that we be able to define
>>>     how a ShEx shape corresponds to an LDOM definition?  Or are they
>>>     being used in completely different places?  I guess if we take
>>>     the XSD/RelaxNG example, there needn't be a deterministic
>>>     relationship between them.
>>>
>>>     Looking back, it seems to me that it would have been a good
>>>     thing if RELAX-NG had been done through the auspices of the W3C
>>>     instead of OASIS.  As it stands now, it seems as if one has to
>>>     choose one's standard organization to support one's technology. 
>>>     If we simply recognize that there could be two different
>>>     perspectives and develop both standards, we  could actually
>>>     provide coherent (non-competitive) advice about when each one
>>>     should be used.  If we don't, and the other perspective has an
>>>     audience, we'll end up seeing it pursued in some other
>>>     organization.  Ugh.
>>>
>>>
>>>     Prima facie, it would seem like we are doubling our work, but I
>>>     don't think that's the case. As Holger said, each group has done
>>>     enough work now to write up a coherent spec.  It would actually
>>>     be *more* work to try to reconcile them into a single
>>>     Recommendation.
>>>
>>>
>>>     This situation seems to me to be a bit different from the
>>>     profiles of OWL, where we use the same words with different
>>>     constraints on their usage. Here, we are solving parallel
>>>     problems with different mechanisms.  Making two standards, that
>>>     are well-informed by one another, seems like a good idea to me.
>>>
>>>
>>>
>>>     Dean
>>>
>>>
>>>
>>>
>>>
>>>
>>>     On Thu, Feb 12, 2015 at 7:25 PM, Holger Knublauch
>>>     <_holger@topquadrant.com_ <mailto:holger@topquadrant.com>> wrote:
>>>     A random thought before the week end:
>>>
>>>     Can this WG (please!) produce two separate standards?
>>>
>>>     1) An RDF vocabulary similar to the original LDOM proposal
>>>     2) The ShEx Compact Syntax aiming at the data reuse scenarios
>>>
>>>     We already have RDF Schema. We already have OWL. We would
>>>     already have a third language (LDOM or whatever). Why not have a
>>>     forth language?
>>>
>>>     The situation in very similar to XML Schema vs. DTD. vs
>>>     RELAX-NG. They all solve similar problems, but from different
>>>     perspectives.
>>>
>>>     We are currently trying to mix different paradigms together and
>>>     risk producing something that nobody will be happy with. People
>>>     with OO background will wonder what the fuzz is about this
>>>     parallel structure called "Shapes", raising the implementation
>>>     costs and creating a mix of parallel semantic webs. And ShEx
>>>     people don't want to worry about the interactions of the various
>>>     triple models at all - instead have the ShExC files live outside
>>>     of the triple store. And that makes sense because even if you
>>>     introduce ldom:instanceShape to separate shapes from classes,
>>>     you'd still run into conflicts with other ShEx models that also
>>>     happen to use ldom:instanceShape. The only proper solution here
>>>     is to not have triples in the first place.
>>>
>>>     Another constant source of conflict will be the role of SPARQL.
>>>     The ShEx camp seems to be more concerned about the balance of
>>>     expressivity and complexity, while the SPIN camp has plenty of
>>>     use cases where expressivity is the main concern. Furthermore, a
>>>     SPIN-like LDOM can more easily be combined with existing RDFS
>>>     and OWL ontologies, filling gaps in that space.
>>>
>>>     We have a handful of ShEx supporters in the WG. I am sure they
>>>     could turn their Member Submission into a formal spec quite
>>>     rapidly. From an LDOM point of view we have plenty of stuff
>>>     already implemented, and I'd be happy to wrestle and collaborate
>>>     with anyone to flesh out the open details. The Requirements
>>>     document is already being split into "Property constraints" and
>>>     "Complex constraints", so both camps can harvest from the same
>>>     catalog of requirements. We can also share test cases and
>>>     produce a small document explaining how to map from one language
>>>     to another. But the aforementioned reasons and the endless
>>>     discussions over the last half a year provide plenty of
>>>     arguments that justify why the WG chose to create two languages.
>>>
>>>     Why would this separation of deliverables not work?
>>>
>>>     Thanks,
>>>     Holger
>>>
>>>
>>>
>>
>>
>
Received on Saturday, 14 February 2015 00:26:39 UTC