Re: Fear for explicit NULL values from Juan Sequeda on 2011-06-14 (public-rdb2rdf-wg@w3.org from June 2011)

From: Juan Sequeda <juanfederico@gmail.com>
Date: Tue, 14 Jun 2011 08:39:28 -0500
To: Enrico Franconi <franconi@inf.unibz.it>
Cc: RDB2RDF WG <public-rdb2rdf-wg@w3.org>
Message-ID: <BANLkTimRTApanLZq7DhWVhM7J97vUgTKPg@mail.gmail.com>

Why?

Because, IMO, this is what the general RDF audience want, and we should
create a standard that people are going to use. That is why I disagree with
your proposal Enrico. We can't simply state that the direct mapping is not
applicable 95% of the time. Then we have wasted 2 years of work on the
direct mapping.

Our task is to bridge the gap and make sure that everything works.. and I
believe that everything will work. The main concern is that if we do not map
the NULLs, our mapping will not be information preserving. In other words,
"how to rebuild the correct answers with explicit NULLS using the direct
mapping" So let me break this down. I believe information preserving holds
the following way:

Let S be a relational schema and Q a relational query over S. Then there
exists a sparql query Q* such that every instance I of S:

T(Q(S,I)) = Q*(M(S,I))

Any relational query Q can be broken down into a set identity query which is
essentially SELECT * FROM table, for all tables that are part of the query.
This identiy relational query is equal to the following sparql query:

(?x A1 ?Ai) OPT ... OPT (?x An ?An))

were Ai is every attribute of table. This is where the schema comes in. We
need to know all the attributes that are part of each table so we can build
this sparql query.

So now what we are missing are the Nulls.

In the sparql query Q*, the solution mapping does not output nulls. But the
result of the relational query Q does output nulls. This is where function T
comes in. Function T maps a relational query output to a sparql solution
mapping... and all this function does is "not output the nulls". Given that
we have the schema we can reconstruct the nulls. For example, if I have the
following:

Q(S,I) = {id = 1, age, null}

Then

T(Q(S,I)) = {id = 1}

This is going to be equal to the sparql solution mapping. If we want to
reverse this, T', given the schema we know that the attributes consist of
"id" and "age" and because the solution mapping only consist of "id", then
for all the missing attributes, they are mapped to null.

In conclusion, we need to be explicit about this function T and state what
it does. My proposal is that in the direct mapping we have this function T
which maps to null value to "nothing" and T' will map the missing attributes
to null. I believe that with my proposal, everything should work.

Enrico, where am I wrong?

Juan Sequeda
+1-575-SEQ-UEDA
www.juansequeda.com

On Mon, Jun 13, 2011 at 3:05 PM, Enrico Franconi <franconi@inf.unibz.it>wrote:

> I have the impression that people are considering the presence of explicit
> NULL values in the data and in the answers as "polluting". In RDBs NULLs are
> everywhere, in the data and in the answers, since day one. You don't have an
> option not to see them in the data or in the answer. They are just there,
> and they have a specific meaning and behaviour (which is the same in Oracle,
> M$-SQL-server, etc). Why in mapping RDBs to RDF graphs you want to hide them
> as if the are bearing a chronic disease? And by doing that, why you want to
> hamper the possibility to keep in the RDF graph the same behaviour (and
> meaning) NULLs had in the original RDB?
> --e.
>

Received on Tuesday, 14 June 2011 13:40:20 UTC