Re: relational data as a bona fide member of the SM from Frank Manola on 2011-11-03 (semantic-web@w3.org from November 2011)

From: Frank Manola <fmanola@acm.org>
Date: Thu, 03 Nov 2011 19:23:10 -0400
To: Alexandre Riazanov <alexandre.riazanov@gmail.com>
Cc: Semantic Web List <semantic-web@w3.org>
Message-id: <758D0517-869E-4232-86B3-F6594F65D690@acm.org>
On Nov 3, 2011, at 6:22 PM, Alexandre Riazanov wrote:

> 
> 
> On Thu, Nov 3, 2011 at 5:20 PM, Frank Manola <fmanola@acm.org> wrote:
> On Nov 3, 2011, at 3:19 PM, Alexandre Riazanov wrote:
> 
>> I have been asking this sort of questions for a while and the only decent answer I know is that 
>> Description Logics only work with unary and binary predicates (classes and properties),
>> although I believe RDF was initially developed independently from the DL and OWL work. 
>>  
>> RIF and RuleML seem to be going in the relational direction (see also the earlier work http://citeseerx.ist.psu.edu/viewdoc/download?doi=10.1.1.48.7623&rep=rep1&type=pdf by Harold Boley), but it is difficult to break the monopoly
>> of RDF+OWL. 
> 
> From my point of view, a major reason for focusing on unary and binary predicates (the logical forms that underlie RDF triples) is that it's easier to deal with the problems of integrating heterogeneous data (a key issue in the semantic web) if the data is in (or is mapped to being in) that form, as opposed to data in arbitrary arity relations (for example, with n-aries you need a schema to interpret any tuples you encounter "in the wild", otherwise you don't know what the "columns" mean).  If you go back to the period before the "monopoly of RDF+OWL"  :-)  and look at the work on integrating heterogeneous relational databases, one of the major approaches to developing the mappings between the various relational schemas was by interpreting the various local schemas in terms of unary and binary relations for just this reason (compound keys had to be dealt with in this way too, because the same combinations of columns didn't necessarily constitute the keys in otherwise corresponding relations in the different local schemas).   Mind you, if you're NOT worried about integrating heterogeneous data, RDF introduces extra pain of its own (figuring out all those identifiers, for one thing), but if you ARE worried about integrating heterogenous data, I think you want those identifiers around.  
> 
> 
> I don't quite understand your argument. Indeed, interoperability is the target. Syntactic interoperability is not a problem as long as you use the same or convertible syntaxes. 
> Semantic interoperability requires shared understanding of the identifiers being used, which has nothing to do with arity. Reinterpreting legacy relational schemas is a related, but separate issue.
> Binary predicates are often handy to represent attributes, but it does not mean n-ary predicates cannot be helpful in the same (although I could not recall a real example) and other KR tasks.  

Let me try again, then (although I can't guarantee I'll be any more understandable this time!).  The original question (I thought) was why there weren't relational approaches applied in Semantic-Web-like contexts (where, as you say, interoperability is the target).  I cited the integration of heterogeneous relational databases to argue that, in this case, where relations were already being used by all parties, and interoperability was the target, those doing the integration found that using unaries and binaries helped (I agree that shared understanding of the identifiers is necessarily for semantic interoperability, but in RDF+OWL, at least the identifiers are *there*;  those putting the data on the Web had to create them).   All that RDF is doing is starting from the unaries and binaries.  This is not an argument that n-ary relations aren't helpful in data modeling.  Nor is it an argument that you can't do semantic integration using n-ary relations.  I simply think it's *easier* to do that integration with the RDF approach, and I cited an historical example as evidence that others have found that as well.  Now, they/we may have simply missed the boat, and if so, someone (possibly you) will have to come along and show us a better way (I'm serious).  There have certainly been attempts to provide more general KRs (allowing n-ary predicates) for data/knowledge exchange;  KIF and Common Logic come to mind.  Perhaps someone with more experience with those languages can chip in here (Pat?) and cite their experiences in using them to integrate large amounts of data, but I'm not aware that they have been, at least so far, notably successful *for that purpose* (they are certainly more powerful, and thus better adapted than RDF/OWL, for other purposes).   Of course, you can always debate how successful RDF has been for that purpose too.  

> 
>> 
>> A related thing I hate about RDF (as a practitioner) is the poor data model. In particular, the open world assumption does not allow to fully and unambiguously describe some objects. Pragmatically, it would be nice to have something like the ML data model.
> 
> This isn't a pragmatic vs. theoretical issue, it's a question of what problem you're trying to solve.  RDF is based on the open world assumption because it's designed with the Web in mind, and the Web, unlike a relational database, is open.
> 
> I don't have a problem with the OWA in general. The problem is the OWA is there even when you 
> don't want it, specifically when you want to be able to specify a piece of data completely and unambiguously. With OWA, you cannot compute the length of a list because somebody else can redefine the list somewhere.

What *I* want is for OWA and CWA not to be "assumptions", but rather explicitly specified.  However, it seems to me you're complaining about the wrong thing.  It's not the OWA per se that makes it possible to take a list you've defined in one place and add stuff to it in other places is it?  Rather, it's the fact that someone can add to the list in other places that makes the OWA the right assumption to use (assuming an app really wants to know all the data the Web holds about the list).  

>  
>   A pragmatic approach to dealing with Web data needs to take that into account.  On the other hand, if you want to consider some collection of RDF as being closed, I don't know anything that would stop you from doing that (e.g., stick it in a relational database and use SQL on it).    
> 
>> 
>> 
>> On Thu, Nov 3, 2011 at 4:57 AM, Sampo Syreeni <decoy@iki.fi> wrote:
>> As a relational minded guy, I wonder why there aren't any genuinely relational minded formats/syntaxes/data around, which still embody the SemWeb/LinkedData mindset. I mean, that ought to be pretty easy to do, and it then ought to bring all of the benefits which once made RM so great and overpowering.
> 
> Well, why don't you guys have a go at it?   This would be a forum where you could bounce some ideas around.  
> 
> 
> Hmm.. I apologise if this list is not an appropriate place. 

I think this list IS an appropriate place.   That's what I thought I said.  Go to it!

>  
>> 
>> Why precisely do all of the semweb formats stay ternary, thereby forcing themselves to reify any higher arity, and as such complicate the processing of higher arity data by adding an extra reification layer?
>> -- 
>> Sampo Syreeni, aka decoy - decoy@iki.fi, http://decoy.iki.fi/front
>> +358-50-5756111, 025E D175 ABE5 027C 9494 EEB0 E090 8BA9 0509 85C2
>> 
>> 
>> 
>> 
>> -- 
>> ======================================
>> Alexandre Riazanov (Alexander Ryazanov), PhD
>> Saint John, New Brunswick, Canada
>> Skype: alexandre.riazanov
>> http://www.freewebs.com/riazanov/
>> http://www.linkedin.com/in/riazanov
>> http://www.unbsj.ca/sase/csas/faculty.php
>> ======================================
> 
> 
> 
> 
> -- 
> ======================================
> Alexandre Riazanov (Alexander Ryazanov), PhD
> Saint John, New Brunswick, Canada
> Skype: alexandre.riazanov
> http://www.freewebs.com/riazanov/
> http://www.linkedin.com/in/riazanov
> http://www.unbsj.ca/sase/csas/faculty.php
> ======================================
Received on Thursday, 3 November 2011 23:23:47 UTC