- From: Marcelo Arenas <marcelo.arenas1@gmail.com>
- Date: Mon, 23 May 2011 22:09:52 -0400
- To: W3C RDB2RDF <public-rdb2rdf-wg@w3.org>

Dear All, As far as I can see, two alternative ways of defining information preservation have been discussed by the group. Let me try to explain these two alternatives. Assume that M is a mapping that takes as input a relational database schema S and an instance I of S, and produces an RDF graph (this function could be the direct mapping defined in http://www.w3.org/TR/2011/WD-rdb-direct-mapping-20110324/). Then we have the following alternatives. (1) We say that M is information preserving if there exists a mapping F such that: (1) F takes as input an RDF graph and produces a relational instance, and (2) for every relational schema S and instance I of S: F(M(S,I)) = I That is, one can reconstruct the original instance I by using the information in M(S,I). (2) Assume given a canonical function T that specifies how to translate relational tuples into solution mappings (http://www.w3.org/TR/rdf-sparql-query/#sparqlSolutions). Then we say that mapping M is information preserving if for every relational algebra query Q over a relational schema S, there exists a SPARQL query Q* such that for every instance I of S: T(Q(I)) = Q*(M(S,I)) That is, the answer to Q over I is "equal" to the answer of Q* over M(S,I) (more precisely, the translation according to T of the set of tuples that form the answer to Q over I is equal to the set of solution mappings that form the answer to Q* over M(S,I)). In my opinion, (1) is a simple and natural definition. The direct mapping defined in http://www.w3.org/TR/2011/WD-rdb-direct-mapping-20110324/ is not information preserving according to (1). But if this mapping is modified to generate triples that store the initial relational schema (in particular, triples for storing the names of the attributes of a relation), then the mapping will be information preserving according to (1). The main drawback of (1) is that it does not impose any restriction on the function F. Notion (2) tries to overcome this limitation by imposing the restriction that it must be possible to answer every relational algebra query Q over the initial data by using a SPARQL query Q* over the translated data (notice that this definition does not impose any restriction on the SPARQL operators used in Q*). But to use notion (2), one needs to choose a canonical function T for translating relational tuples into solution mappings. For example, if this canonical function is defined by associating to each relational tuple t a solution mapping mu such that (notice that relational tuples are treated as function): - the domain of mu is equal to the domain of t - mu(A) = t(A) if t(A) is not null, and mu(A) is a fresh blank node if t(A) is null Then we have that the direct mapping defined in http://www.w3.org/TR/2011/WD-rdb-direct-mapping-20110324/ is not information preserving. On the other hand, if the canonical function T is defined by associating to each relational tuple t a solution mapping mu such that: - the domain of mu is equal to the set of attributes A such that t(A) is not null - mu(A) = t(A) if t(A) is not null Then we have that the direct mapping defined in http://www.w3.org/TR/2011/WD-rdb-direct-mapping-20110324/ is information preserving. In my opinion, one of the first questions to answer is how canonical mapping T should be defined. Or, more concretely, suppose that we are given the following tuple from Enrico's example: t(ID) = 1 t(A) = NULL Which one of the following mappings represent the information in this tuple? (a) mu_1 with domain {ID, A} and such that mu_1(ID) = 1 and mu_1(A) = _:b (b) mu_2 with domain {ID} and such that mu_2(ID) = 1 I hope that we can answer this question. All the best, Marcelo

Received on Tuesday, 24 May 2011 02:10:20 UTC