- From: Juan Sequeda <juanfederico@gmail.com>
- Date: Tue, 14 Jun 2011 08:39:28 -0500
- To: Enrico Franconi <franconi@inf.unibz.it>
- Cc: RDB2RDF WG <public-rdb2rdf-wg@w3.org>
- Message-ID: <BANLkTimRTApanLZq7DhWVhM7J97vUgTKPg@mail.gmail.com>
Why? Because, IMO, this is what the general RDF audience want, and we should create a standard that people are going to use. That is why I disagree with your proposal Enrico. We can't simply state that the direct mapping is not applicable 95% of the time. Then we have wasted 2 years of work on the direct mapping. Our task is to bridge the gap and make sure that everything works.. and I believe that everything will work. The main concern is that if we do not map the NULLs, our mapping will not be information preserving. In other words, "how to rebuild the correct answers with explicit NULLS using the direct mapping" So let me break this down. I believe information preserving holds the following way: Let S be a relational schema and Q a relational query over S. Then there exists a sparql query Q* such that every instance I of S: T(Q(S,I)) = Q*(M(S,I)) Any relational query Q can be broken down into a set identity query which is essentially SELECT * FROM table, for all tables that are part of the query. This identiy relational query is equal to the following sparql query: (?x A1 ?Ai) OPT ... OPT (?x An ?An)) were Ai is every attribute of table. This is where the schema comes in. We need to know all the attributes that are part of each table so we can build this sparql query. So now what we are missing are the Nulls. In the sparql query Q*, the solution mapping does not output nulls. But the result of the relational query Q does output nulls. This is where function T comes in. Function T maps a relational query output to a sparql solution mapping... and all this function does is "not output the nulls". Given that we have the schema we can reconstruct the nulls. For example, if I have the following: Q(S,I) = {id = 1, age, null} Then T(Q(S,I)) = {id = 1} This is going to be equal to the sparql solution mapping. If we want to reverse this, T', given the schema we know that the attributes consist of "id" and "age" and because the solution mapping only consist of "id", then for all the missing attributes, they are mapped to null. In conclusion, we need to be explicit about this function T and state what it does. My proposal is that in the direct mapping we have this function T which maps to null value to "nothing" and T' will map the missing attributes to null. I believe that with my proposal, everything should work. Enrico, where am I wrong? Juan Sequeda +1-575-SEQ-UEDA www.juansequeda.com On Mon, Jun 13, 2011 at 3:05 PM, Enrico Franconi <franconi@inf.unibz.it>wrote: > I have the impression that people are considering the presence of explicit > NULL values in the data and in the answers as "polluting". In RDBs NULLs are > everywhere, in the data and in the answers, since day one. You don't have an > option not to see them in the data or in the answer. They are just there, > and they have a specific meaning and behaviour (which is the same in Oracle, > M$-SQL-server, etc). Why in mapping RDBs to RDF graphs you want to hide them > as if the are bearing a chronic disease? And by doing that, why you want to > hamper the possibility to keep in the RDF graph the same behaviour (and > meaning) NULLs had in the original RDB? > --e. >
Received on Tuesday, 14 June 2011 13:40:20 UTC