Re: Minutes for the 2010-10-21 RDB2RDF meeting + semantics discussion from Marcelo Arenas on 2010-10-22 (public-rdb2rdf-wg@w3.org from October 2010)

From: Marcelo Arenas <marcelo.arenas1@gmail.com>
Date: Fri, 22 Oct 2010 10:47:42 -0300
To: Alexandre Bertails <bertails@w3.org>
Cc: RDB2RDF WG <public-rdb2rdf-wg@w3.org>
Message-ID: <AANLkTinQDnEvRSTb-5wHGFCTHM6EJGocScYBjC+oBu-2@mail.gmail.com>
Dear All,

On Thu, Oct 21, 2010 at 5:32 PM, Alexandre Bertails <bertails@w3.org> wrote:
> Hello guys,
>
> here are the minutes for "RDB2RDF - Formal mapping - semantics" staged
> at [1]. Sorry for all the @@ but I had troubles to associate the voices
> with the right people.
>
> == Quick Summary ==
>
> Here is a quick summary:
> * importance of the 7 use-cases which should have their own section
> * it's ok to have some examples covering several use-cases (if it's
> said)
> * consensus around the SQL terminology instead of the Relational Algebra
> one, because of the intended public
> * Marcelo and ?? proposed to update EricP's documents based on the
> previous points. Should be done next week.
> * discussion about what "semantics" means in the context of RDB2RDF, no
> consensus. See below for more information.
> * to answer the previous point, Marcello will send an email with the
> right informations (a digitalized book) and will give some context

In the conference call, I argued that we need a syntax and semantics
for the mapping language, but we did not reach a consensus about
whether the mapping languages should have a semantics. To explain what
I mean by the semantics of a mapping language, below I give some
information about how the problem of data exchange (or data
translation) is usually formalized in the database context.

In the relational databases context, the data exchange (or data
translation) problem is usually formalized as follows. You are given a
source relational schema S, a target relational schema T (T could
consist of the table Triple for storing RDF triples), and a mapping M
that specifies how to translate data from the source into the target
[1], and then the problem is to take data structured under the source
schema S and creating an instance of the target schema T according to
the conditions specified by M. An important issue in this setting is
to define a mapping language for expressing mappings like M, which
means to define the syntax and semantics of this mapping language:

- The syntax of the mapping language is usually defined by considering
a syntactic restriction of first-order logic, like source-to-target
tuple-generating dependencies (see [1] for the formal definition of
these dependencies, which are widely used in this area).

- The semantics of the mapping language refers to the following
problem: Given a source instance I, a target instance J and a mapping
M, is J a valid translation of I according to M? If M is specified by
using a set F of first-order logic sentences, then the semantics of
the mapping language is given in terms of the semantics for
first-order logic: J is a valid translation of I under M if and only
if (I,J) satisfies F in the usual first-order logic sense (all these
ideas are formalized in [1]).

It is important to notice that in the above setting, it could be the
case that there exist several possible translations for the same
source instance (as M could, for example, create new values in the
target), so one has to formally define what is the target instance
that reflects the source data as accurately as possible. Once you have
done that, you can consider the mapping M as a function that maps each
source instance I into the "better" translation of I according to M
(this "better" solution is usually the "canonical universal solution"
or the "core of the canonical universal solution", which are formally
defined in [1,2]).

A survey about the tools developed at IBM by following the above
approach can be found in [3] (references [1,2,3] can be downloaded
from http://www.almaden.ibm.com/cs/people/fagin). In [4], the author
shows how the data exchange problem is formalized in logical terms,
and what some of the important issues in this area are (this survey
can be download from
http://www.sigmod.org/publications/sigmod-record/0903/index.html).
Finally, there are also two short books where you can find information
about the above approach. In [5], it is given a fairly complete
picture of the main issues in data exchange, which also includes the
case of XML  data (it should be noticed that the above approach is
also applicable in other data models like XML and RDF). In [6],  it is
shown how some rule languages (like non-recursive Datalog with
equality and safe negation, and some of its extensions) have been used
in data integration/exchange. These two short books are available
electronically in many libraries.

All the best,

Marcelo


[1] R. Fagin, P. G. Kolaitis, R. J. Miller, L. Popa: Data exchange:
semantics and query answering. Theor. Comput. Sci. 336(1): 89-124,
2005.

[2] R. Fagin, P. G. Kolaitis, L. Popa: Data exchange: getting to the
core. ACM Trans. Database Syst. 30(1): 174-210, 2005.

[3] R. Fagin, L. M. Haas, M. A. Hernández, R. J. Miller, L. Popa, Y.
Velegrakis: Clio: Schema Mapping Creation and Data Exchange.
Conceptual Modeling: Foundations and Applications 2009: 198-236.

[4] P. Barcelo: Logical foundations of relational data exchange.
SIGMOD Record 38(1): 49-58, 2009.

[5] M. Arenas, P. Barcelo, L. Libkin, F. Murlak: Relational and XML
Data Exchange Morgan & Claypool Publishers, 2010.

[6] M. Genesereth. Data Integration: The Relational Logic Approach.
Morgan & Claypool Publishers, 2010.
Received on Friday, 22 October 2010 13:48:14 UTC