ISSUE-41 bNode semantics: Information preservation from Marcelo Arenas on 2011-05-24 (public-rdb2rdf-wg@w3.org from May 2011)

From: Marcelo Arenas <marcelo.arenas1@gmail.com>
Date: Mon, 23 May 2011 22:09:52 -0400
To: W3C RDB2RDF <public-rdb2rdf-wg@w3.org>
Message-ID: <BANLkTikXjNQHQJNGcRgO+VXBixPRmRKPHg@mail.gmail.com>
Dear All,

As far as I can see, two alternative ways of defining information
preservation have been discussed by the group. Let me try to explain
these two alternatives.

Assume that M is a mapping that takes as input a relational database
schema S and an instance I of S, and produces an RDF graph (this
function could be the direct mapping defined in
http://www.w3.org/TR/2011/WD-rdb-direct-mapping-20110324/). Then we
have the following alternatives.

(1) We say that M is information preserving if there exists a mapping
F such that: (1) F takes as input an RDF graph and produces a
relational instance, and (2) for every relational schema S and
instance I of S:

F(M(S,I)) = I

That is, one can reconstruct the original instance I by using the
information in M(S,I).

(2) Assume given a canonical function T that specifies how to
translate relational tuples into solution mappings
(http://www.w3.org/TR/rdf-sparql-query/#sparqlSolutions). Then we say
that mapping M is information preserving if for every relational
algebra query Q over a relational schema S, there exists a SPARQL
query Q* such that for every instance I of S:

T(Q(I)) = Q*(M(S,I))

That is, the answer to Q over I is "equal" to the answer of Q* over
M(S,I) (more precisely, the translation according to T of the set of
tuples that form the answer to Q over I is equal to the set of
solution mappings that form the answer to Q* over M(S,I)).


In my opinion, (1) is a simple and natural definition. The direct
mapping defined in
http://www.w3.org/TR/2011/WD-rdb-direct-mapping-20110324/ is not
information preserving according to (1). But if this mapping is
modified to generate triples that store the initial relational schema
(in particular, triples for storing the names of the attributes of a
relation), then the mapping will be information preserving according
to (1).

The main drawback of (1) is that it does not impose any restriction on
the function F. Notion (2) tries to overcome this limitation by
imposing the restriction that it must be possible to answer every
relational algebra query Q over the initial data by using a SPARQL
query Q* over the translated data (notice that this definition does
not impose any restriction on the SPARQL operators used in Q*). But to
use notion (2), one needs to choose a canonical function T for
translating relational tuples into solution mappings. For example, if
this canonical function is defined by associating to each relational
tuple t a solution mapping mu such that (notice that relational tuples
are treated as function):

- the domain of mu is equal to the domain of t
- mu(A) = t(A) if t(A) is not null, and mu(A) is a fresh blank node if
t(A) is null

Then we have that the direct mapping defined in
http://www.w3.org/TR/2011/WD-rdb-direct-mapping-20110324/ is not
information preserving. On the other hand, if the canonical function T
is defined by associating to each relational tuple t a solution
mapping mu such that:

- the domain of mu is equal to the set of attributes A such that t(A)
is not null
- mu(A) = t(A) if t(A) is not null

Then we have that the direct mapping defined in
http://www.w3.org/TR/2011/WD-rdb-direct-mapping-20110324/ is
information preserving.

In my opinion, one of the first questions to answer is how canonical
mapping T should be defined. Or, more concretely, suppose that we are
given the following tuple from Enrico's example:

t(ID) = 1
t(A) = NULL

Which one of the following mappings represent the information in this
tuple?

(a) mu_1 with domain {ID, A} and such that mu_1(ID) = 1 and mu_1(A) =
_:b
(b) mu_2 with domain {ID} and such that mu_2(ID) = 1


I hope that we can answer this question.

All the best,

Marcelo
Received on Tuesday, 24 May 2011 02:10:20 UTC