Information Preserving


By information preserving, I mean that given the RDF data, I can reconstruct
the relational table with all its values. Informally, given an identity SQL
query (a query that outputs the whole table: SELECT * FROM table), there
exist a SPARQL query which is executed on the RDF data and will return the
same results of the identity SQL query.

There are two cases for information preserving

1) We have knowledge the schema

If the relational schema is directly mapped to RDFS/OWL, then we DO NOT need
to translate nulls in order to preserve information. For example, consider
the table R with attributes A and B and instances:

R(Bob, NULL)
R(Alice, 25)

The ontology from this schema is

<R> <type> <class>
<A> <type> <property>
<A> <domain> <R>
<A> <range> <whatever datatype>
<B> <type> <property>
<B> <domain> <R>
<B> <range> <whatever datatype>

And the RDF data, without translating nulls:

<row1> <R#A> "Bob"
<row2> <R#A> "Alice"
<row2> <R#B> "25"

The identity SQL query is


Given that we know the schema, we can construct a SPARQL query:

SELECT ?a ?b
?x <R#A> ?a
 ?x <R#B> ?B

There we go... with that SPARQL query, we can reconstruct the the original
relational table. No need of nulls. If we did triples for NULL values, then
the SPARQL query wouldn't have OPTIONALS. The issue here is that we don't
need triples for NULL values.

2) We don't have knowledge of the schema

If we do not have knowledge of the schema, then we can't create a SPARQL
query like the previous example. Just imagine that you can only look at the
RDF data. For example, consider the following RDF:

<row1> <R#A> "Bob"
<row2> <R#A> "Alice"
<row2> <R#B> "25"

Given that one of the row 2 has <R#B> and row 1 doesn't, I could guess that
the value of row 1 for attribute B is null. But what if the original table
has a column C and every single row has a NULL value for that column. In
this case, it would be necessary to explicitly translate NULL values into an
RDF triple. Otherwise, then the mapping would not be information preserving.


- At this moment, neither the Direct Mapping or R2RML consider the schema,
therefore in order for the mappings to be Information Preserving we must
explicitly translate NULL values to an RDF triple.
- We need to figure out how is this triple going to show up?
- From a theoretical side, if we do not generate triples for NULL values,
them mapping monotonic. On the other hand, generating triples for NULL
values will make the mapping non-monotonic. Do we care? Not really. But
implementation and performance-wise, there can be some overhead when dealing
with non-monotonicity

Juan Sequeda

Received on Tuesday, 17 May 2011 18:02:19 UTC