Re: Fear for explicit NULL values from Juan Sequeda on 2011-06-14 (public-rdb2rdf-wg@w3.org from June 2011)

From: Juan Sequeda <juanfederico@gmail.com>
Date: Tue, 14 Jun 2011 10:26:59 -0500
To: Enrico Franconi <franconi@inf.unibz.it>
Cc: RDB2RDF WG <public-rdb2rdf-wg@w3.org>
Message-Id: <55786175-FB24-40C9-9F3F-151A75EB4316@gmail.com>
On Jun 14, 2011, at 10:19 AM, Enrico Franconi <franconi@inf.unibz.it> wrote:

> I am now seeing something concrete, finally.

Thanks. I've actually had this email in draft for a while. 

This is actually part of the paper that Marcelo Arenas, Dan Miranker and I are writing. 

> We need to double check the generality of the approach.

Now we can work on this together. I'm sure we can kind the correct solution instead of coming up with a dramatic proposal :)

> For example: what if you have a query asking just for the values of an attribute which may contain NULL values (so you don't output the id)?
> And: how do you solve my query (c) in the wiki? It seems to me that you need also to have a notion of the schema of the answer set.
> These are the kind of questions I'd like to see answered :-)

I need to relook at this. Do you have an answer?

Enrico, do you think we can answer your questions over the phone in the next hour? Otherwise, I propose that we skip this topic on today's call so we can have progress on other issues. It looks like we can have more progress over email? Is this ok with you? 

If so, would this be ok with the chairs?

Btw, I'll be in Chile next week with Marcelo and we could have a separate call with interested parties to address this particular issue. My only concern is to get things done for the sept 1 deadline

Looking forward to your comments 
> 
> On 14 Jun 2011, at 15:39, Juan Sequeda <juanfederico@gmail.com> wrote:
> 
>> Why?
>> 
>> Because, IMO, this is what the general RDF audience want, and we should create a standard that people are going to use. That is why I disagree with your proposal Enrico. We can't simply state that the direct mapping is not applicable 95% of the time. Then we have wasted 2 years of work on the direct mapping.
>> 
>> Our task is to bridge the gap and make sure that everything works.. and I believe that everything will work. The main concern is that if we do not map the NULLs, our mapping will not be information preserving. In other words, "how to rebuild the correct answers with explicit NULLS using the direct mapping" So let me break this down. I believe information preserving holds the following way:
>> 
>> Let S be a relational schema and Q a relational query over S. Then there exists a sparql query Q* such that every instance I of S:
>> 
>> T(Q(S,I)) = Q*(M(S,I))
>> 
>> Any relational query Q can be broken down into a set identity query which is essentially SELECT * FROM table, for all tables that are part of the query. This identiy relational query is equal to the following sparql query:
>> 
>> (?x A1 ?Ai) OPT ... OPT (?x An ?An))
>> 
>> were Ai is every attribute of table. This is where the schema comes in. We need to know all the attributes that are part of each table so we can build this sparql query.
>> 
>> So now what we are missing are the Nulls.
>> 
>> In the sparql query Q*, the solution mapping does not output nulls. But the result of the relational query Q does output nulls. This is where function T comes in. Function T maps a relational query output to a sparql solution mapping... and all this function does is "not output the nulls". Given that we have the schema we can reconstruct the nulls. For example, if I have the following:
>> 
>> Q(S,I) = {id = 1, age, null}
>> 
>> Then
>> 
>> T(Q(S,I)) = {id = 1}
>> 
>> This is going to be equal to the sparql solution mapping. If we want to reverse this, T', given the schema we know that the attributes consist of "id" and "age" and because the solution mapping only consist of "id", then for all the missing attributes, they are mapped to null. 
>> 
>> In conclusion, we need to be explicit about this function T and state what it does. My proposal is that in the direct mapping we have this function T which maps to null value to "nothing" and T' will map the missing attributes to null. I believe that with my proposal, everything should work.
>> 
>> Enrico, where am I wrong?
>> 
>> 
>> Juan Sequeda
>> +1-575-SEQ-UEDA
>> www.juansequeda.com
>> 
>> 
>> On Mon, Jun 13, 2011 at 3:05 PM, Enrico Franconi <franconi@inf.unibz.it> wrote:
>> I have the impression that people are considering the presence of explicit NULL values in the data and in the answers as "polluting". In RDBs NULLs are everywhere, in the data and in the answers, since day one. You don't have an option not to see them in the data or in the answer. They are just there, and they have a specific meaning and behaviour (which is the same in Oracle, M$-SQL-server, etc). Why in mapping RDBs to RDF graphs you want to hide them as if the are bearing a chronic disease? And by doing that, why you want to hamper the possibility to keep in the RDF graph the same behaviour (and meaning) NULLs had in the original RDB?
>> --e.
>>
Received on Tuesday, 14 June 2011 15:27:46 UTC