Re: Reverse Mapping RDF2RDF

On Sun, 2010-11-21 at 09:45 -0600, Juan Sequeda wrote:
> On Sun, Nov 21, 2010 at 9:31 AM, Alexandre Bertails <bertails@w3.org>
> wrote:
>         [ snip ]
>         > > The relation between RDB, RDF, SPARQL and SQL should be
>         formalized and
>         > > should not rely only on examples. It's called semantics
>         preservation
>         > > -- a well-known problem in the compilation field -- and
>         IMO it should
>         > > be in a normative section.
>         > >
>         > > Eirc and I are totally confident that in the case of the
>         direct
>         > > mapping, we can map any SPARQL query to its equivalent SQL
>         query.
>         >
>         > Where is the proof?
>         
>         
>         heh :-)
>         
>         I think we first need to have a definitive version of RDB +
>         Direct Mapping.
> 
> 
> I don't think the proof doesn't depend on the "definitive version".
> Essentially there is a concept of direct mapping a RDB to RDF. The
> proof must work, regardless the way you represent it (sets, rules).

We don't use sets but more elaborate *data structures* like lists,
multisets. The Direct Mapping is not a set semantics, it's an algebra
based on two main data models (RDB and RDF) and a function to bind
them.

We have motivated this choice by several reasons:
* real-world relational databases diverged a long time ago from the
  relational algebra. It's a nice theoretical framework but it does
  not reflect the reality (I'm the first one being sad about it)
* our entry point is RDB. We *need* to access the RDB structure,
  the table names, the column names, the SQL types, etc. The
  relational algebra does not give you any information about that,so
  does not Datalog unless you encode this information in a relation
  but in that case, you have to define the function doing so.
* we also need access to the data itself. The relational algebra and
  Datalog work on top on sets. We *absolutely need* multisets. Despite
  my repeted questions on that subject, I still don't know how you
  intend to solve that problem.

> The proof is actually simple if you consider the direct mapping rules.
> Angles and Gutierrez proved that "SPARQL and non-recursive Datalog
> with negation have the equivalent recursive power, and hence, by
> classical results, SPARQL is equivalent from an expressive point of
> view to Relational Algebra". By representing the mapping in datalog
> rules, one can represent SPARQL as a datalog query, and then use the
> mapping rules to rewrite the query to relational algebra. 

Well, I guess we should ask the SPARQL folks about this specific
statement :-) If this is true, I suggest that Angles, Gutierrez and
you start a specification at W3C so the whole community could benefit
from such a framework.

At least, in rdb2rdf, we don't have to say a word about that. We just
have to be clean by following the definition of RDF.

> Therefore, thanks to the Angles and Gutierrez result, we don't have to
> worry about the SPARQL to SQL issue, from the direct mapping
> perspective. 

I believe only what I see. I've worked in the past on proving stuff
that was true on 20 years old papers but could not resist to the
formal proof.

That's why I now declare victory only when I see a full embedding on
the problem, its solution and its proof in a proof checker like
Coq [1].

I need at least an implementation if not a constructive proof [2].

> For R2RML, if that language has more expressivity than
> datalog, then we might be in trouble.

I thought that was somewhat the plan. Did I miss anything on that
subject?

Alexandre.

[1] http://coq.inria.fr/
[2] http://en.wikipedia.org/wiki/Constructive_proof

Received on Sunday, 21 November 2010 17:17:43 UTC