Re: Reverse Mapping RDF2RDF from Harry Halpin on 2010-11-21 (public-rdb2rdf-wg@w3.org from November 2010)

From: Harry Halpin <hhalpin@w3.org>
Date: Sun, 21 Nov 2010 21:52:32 -0000 (GMT)
To: "Juan Sequeda" <juanfederico@gmail.com>
Cc: "public-rdb2rdf-wg@w3.org" <public-rdb2rdf-wg@w3.org>
Message-ID: <6302982c443d2caedf104dceff3a57ee.squirrel@webmail-mit.w3.org>
Just quick note,

   I'm glad to see this conversation evoked such passion. Note that Soeren
Auer, who is hopefully still reading these e-mails, also put forward
SPARQL to SQL translation very early on in the life of this WG as a way
of making this work. At the time, the WG in general thought that while
doing that would be nice, transforming SPARQL into some kind of optimal
SQL would like depend on vendor-specific SQL functionality, so the idea
was that both transformation of SQL to SPARQL and SPARQL to SQL should
remain a point of market competition, and that such transformations
could obviously be built on top of a more basic relational data to RDF
transform as specified by R2RML.

Even if we could spec SPARQL-to-SQL out, one has to rely on some
idealization of SQL, and then, the transform it's pretty trivial for
reasons already stated - i.e. well-known equivalnce between relational
algebra and FOL (datalog). Given that at least we have a publically
available for spec for SPARQL, it's a bit easier on the SPARQL side
actually. So feel free to write up some formalization and put it in a
proof-checker, and then ship the results to an academic database
conference (or in your product).

Given the state of our schedule (i.e. behind) and given this entire
"reverse" mapping conversation came up from a not-too-serious off-the-cuff
remark, I'd suggest that we stick to charter and not have this on the
agenda. If people do write up formalizations and manage to get them
published at academic database conferences, feel free to post them to the
list.

What *would* be useful at this point would be a strategy for making sure
R2RML covers the requirements in the charter, and an pragmatic strategy
for getting there as regards both semantics and test-cases.

> On Sun, Nov 21, 2010 at 11:17 AM, Alexandre Bertails
> <bertails@w3.org>wrote:
>
>> On Sun, 2010-11-21 at 09:45 -0600, Juan Sequeda wrote:
>> > On Sun, Nov 21, 2010 at 9:31 AM, Alexandre Bertails <bertails@w3.org>
>> > wrote:
>> >         [ snip ]
>> >         > > The relation between RDB, RDF, SPARQL and SQL should be
>> >         formalized and
>> >         > > should not rely only on examples. It's called semantics
>> >         preservation
>> >         > > -- a well-known problem in the compilation field -- and
>> >         IMO it should
>> >         > > be in a normative section.
>> >         > >
>> >         > > Eirc and I are totally confident that in the case of the
>> >         direct
>> >         > > mapping, we can map any SPARQL query to its equivalent SQL
>> >         query.
>> >         >
>> >         > Where is the proof?
>> >
>> >
>> >         heh :-)
>> >
>> >         I think we first need to have a definitive version of RDB +
>> >         Direct Mapping.
>> >
>> >
>> > I don't think the proof doesn't depend on the "definitive version".
>> > Essentially there is a concept of direct mapping a RDB to RDF. The
>> > proof must work, regardless the way you represent it (sets, rules).
>>
>> We don't use sets but more elaborate *data structures* like lists,
>> multisets. The Direct Mapping is not a set semantics, it's an algebra
>> based on two main data models (RDB and RDF) and a function to bind
>> them.
>>
>> We have motivated this choice by several reasons:
>> * real-world relational databases diverged a long time ago from the
>>  relational algebra. It's a nice theoretical framework but it does
>>  not reflect the reality (I'm the first one being sad about it)
>> * our entry point is RDB. We *need* to access the RDB structure,
>>  the table names, the column names, the SQL types, etc. The
>>  relational algebra does not give you any information about that,so
>>  does not Datalog unless you encode this information in a relation
>>  but in that case, you have to define the function doing so.
>> * we also need access to the data itself. The relational algebra and
>>  Datalog work on top on sets. We *absolutely need* multisets. Despite
>>  my repeted questions on that subject, I still don't know how you
>>  intend to solve that problem.
>>
>> > The proof is actually simple if you consider the direct mapping rules.
>> > Angles and Gutierrez proved that "SPARQL and non-recursive Datalog
>> > with negation have the equivalent recursive power, and hence, by
>> > classical results, SPARQL is equivalent from an expressive point of
>> > view to Relational Algebra". By representing the mapping in datalog
>> > rules, one can represent SPARQL as a datalog query, and then use the
>> > mapping rules to rewrite the query to relational algebra.
>>
>> Well, I guess we should ask the SPARQL folks about this specific
>> statement :-) If this is true, I suggest that Angles, Gutierrez and
>> you start a specification at W3C so the whole community could benefit
>> from such a framework.
>
>
>> At least, in rdb2rdf, we don't have to say a word about that. We just
>> have to be clean by following the definition of RDF.
>>
>> > Therefore, thanks to the Angles and Gutierrez result, we don't have to
>> > worry about the SPARQL to SQL issue, from the direct mapping
>> > perspective.
>>
>> I believe only what I see. I've worked in the past on proving stuff
>> that was true on 20 years old papers but could not resist to the
>> formal proof.
>>
>> That's why I now declare victory only when I see a full embedding on
>> the problem, its solution and its proof in a proof checker like
>> Coq [1].
>>
>>
> I'll be publishing the proof done in ACL2 (theorem prover) in a few weeks.
>
>
>> I need at least an implementation if not a constructive proof [2].
>>
>> > For R2RML, if that language has more expressivity than
>> > datalog, then we might be in trouble.
>>
>> I thought that was somewhat the plan. Did I miss anything on that
>> subject?
>>
>
> I guess so... but what is the guarantee that a SPARQL query can be
> translated to SQL?
>
>
>>
>> Alexandre.
>>
>> [1] http://coq.inria.fr/
>> [2] http://en.wikipedia.org/wiki/Constructive_proof
>>
>>
>>
>
Received on Sunday, 21 November 2010 21:52:34 UTC