Re: Reverse Mapping RDF2RDF from ashok malhotra on 2010-11-21 (public-rdb2rdf-wg@w3.org from November 2010)

From: ashok malhotra <ashok.malhotra@oracle.com>
Date: Sun, 21 Nov 2010 14:10:40 -0800
To: Harry Halpin <hhalpin@w3.org>
CC: Juan Sequeda <juanfederico@gmail.com>, "public-rdb2rdf-wg@w3.org" <public-rdb2rdf-wg@w3.org>
Message-ID: <4CE998E0.1080608@oracle.com>
  Thanks, Harry!
By 'reverse mapping' I meant mapping from RDF to RDB.  And I think it would be fun if someone
gave this a bit of thought and wrote a paper..

I did not mean SPARQL to SQL translation.  We discussed this during the RDB2RDF XG and decided
that this was an area we would not standardize.  Instead, we decided, that this
required innovation and vendors could compete with their implementations.
All the best, Ashok

On 11/21/2010 1:52 PM, Harry Halpin wrote:
> Just quick note,
>
>     I'm glad to see this conversation evoked such passion. Note that Soeren
> Auer, who is hopefully still reading these e-mails, also put forward
> SPARQL to SQL translation very early on in the life of this WG as a way
> of making this work. At the time, the WG in general thought that while
> doing that would be nice, transforming SPARQL into some kind of optimal
> SQL would like depend on vendor-specific SQL functionality, so the idea
> was that both transformation of SQL to SPARQL and SPARQL to SQL should
> remain a point of market competition, and that such transformations
> could obviously be built on top of a more basic relational data to RDF
> transform as specified by R2RML.
>
> Even if we could spec SPARQL-to-SQL out, one has to rely on some
> idealization of SQL, and then, the transform it's pretty trivial for
> reasons already stated - i.e. well-known equivalnce between relational
> algebra and FOL (datalog). Given that at least we have a publically
> available for spec for SPARQL, it's a bit easier on the SPARQL side
> actually. So feel free to write up some formalization and put it in a
> proof-checker, and then ship the results to an academic database
> conference (or in your product).
>
> Given the state of our schedule (i.e. behind) and given this entire
> "reverse" mapping conversation came up from a not-too-serious off-the-cuff
> remark, I'd suggest that we stick to charter and not have this on the
> agenda. If people do write up formalizations and manage to get them
> published at academic database conferences, feel free to post them to the
> list.
>
> What *would* be useful at this point would be a strategy for making sure
> R2RML covers the requirements in the charter, and an pragmatic strategy
> for getting there as regards both semantics and test-cases.
>
>> On Sun, Nov 21, 2010 at 11:17 AM, Alexandre Bertails
>> <bertails@w3.org>wrote:
>>
>>> On Sun, 2010-11-21 at 09:45 -0600, Juan Sequeda wrote:
>>>> On Sun, Nov 21, 2010 at 9:31 AM, Alexandre Bertails<bertails@w3.org>
>>>> wrote:
>>>>          [ snip ]
>>>>          >  >  The relation between RDB, RDF, SPARQL and SQL should be
>>>>          formalized and
>>>>          >  >  should not rely only on examples. It's called semantics
>>>>          preservation
>>>>          >  >  -- a well-known problem in the compilation field -- and
>>>>          IMO it should
>>>>          >  >  be in a normative section.
>>>>          >  >
>>>>          >  >  Eirc and I are totally confident that in the case of the
>>>>          direct
>>>>          >  >  mapping, we can map any SPARQL query to its equivalent SQL
>>>>          query.
>>>>          >
>>>>          >  Where is the proof?
>>>>
>>>>
>>>>          heh :-)
>>>>
>>>>          I think we first need to have a definitive version of RDB +
>>>>          Direct Mapping.
>>>>
>>>>
>>>> I don't think the proof doesn't depend on the "definitive version".
>>>> Essentially there is a concept of direct mapping a RDB to RDF. The
>>>> proof must work, regardless the way you represent it (sets, rules).
>>> We don't use sets but more elaborate *data structures* like lists,
>>> multisets. The Direct Mapping is not a set semantics, it's an algebra
>>> based on two main data models (RDB and RDF) and a function to bind
>>> them.
>>>
>>> We have motivated this choice by several reasons:
>>> * real-world relational databases diverged a long time ago from the
>>>   relational algebra. It's a nice theoretical framework but it does
>>>   not reflect the reality (I'm the first one being sad about it)
>>> * our entry point is RDB. We *need* to access the RDB structure,
>>>   the table names, the column names, the SQL types, etc. The
>>>   relational algebra does not give you any information about that,so
>>>   does not Datalog unless you encode this information in a relation
>>>   but in that case, you have to define the function doing so.
>>> * we also need access to the data itself. The relational algebra and
>>>   Datalog work on top on sets. We *absolutely need* multisets. Despite
>>>   my repeted questions on that subject, I still don't know how you
>>>   intend to solve that problem.
>>>
>>>> The proof is actually simple if you consider the direct mapping rules.
>>>> Angles and Gutierrez proved that "SPARQL and non-recursive Datalog
>>>> with negation have the equivalent recursive power, and hence, by
>>>> classical results, SPARQL is equivalent from an expressive point of
>>>> view to Relational Algebra". By representing the mapping in datalog
>>>> rules, one can represent SPARQL as a datalog query, and then use the
>>>> mapping rules to rewrite the query to relational algebra.
>>> Well, I guess we should ask the SPARQL folks about this specific
>>> statement :-) If this is true, I suggest that Angles, Gutierrez and
>>> you start a specification at W3C so the whole community could benefit
>>> from such a framework.
>>
>>> At least, in rdb2rdf, we don't have to say a word about that. We just
>>> have to be clean by following the definition of RDF.
>>>
>>>> Therefore, thanks to the Angles and Gutierrez result, we don't have to
>>>> worry about the SPARQL to SQL issue, from the direct mapping
>>>> perspective.
>>> I believe only what I see. I've worked in the past on proving stuff
>>> that was true on 20 years old papers but could not resist to the
>>> formal proof.
>>>
>>> That's why I now declare victory only when I see a full embedding on
>>> the problem, its solution and its proof in a proof checker like
>>> Coq [1].
>>>
>>>
>> I'll be publishing the proof done in ACL2 (theorem prover) in a few weeks.
>>
>>
>>> I need at least an implementation if not a constructive proof [2].
>>>
>>>> For R2RML, if that language has more expressivity than
>>>> datalog, then we might be in trouble.
>>> I thought that was somewhat the plan. Did I miss anything on that
>>> subject?
>>>
>> I guess so... but what is the guarantee that a SPARQL query can be
>> translated to SQL?
>>
>>
>>> Alexandre.
>>>
>>> [1] http://coq.inria.fr/
>>> [2] http://en.wikipedia.org/wiki/Constructive_proof
>>>
>>>
>>>
>
Received on Sunday, 21 November 2010 22:42:50 UTC