- From: Juan Sequeda <juanfederico@gmail.com>
- Date: Tue, 14 Jun 2011 11:36:31 -0500
- To: Enrico Franconi <franconi@inf.unibz.it>
- Cc: RDB2RDF WG <public-rdb2rdf-wg@w3.org>
- Message-ID: <BANLkTim1bQCz-PyBc0edCprv1rFx6L+GZQ@mail.gmail.com>
>From a theoretical perspective and with my academic hat on.. I don't care. Actually, in our paper, we are doing both. These approaches have their own interesting properties. >From a developer and reality perspective... people are not expecting nulls in their RDF. They will be confused. They don't want to see nulls! See [1]. Let's try to make everybody happy. I think we can :) [1] http://lists.w3.org/Archives/Public/public-rdb2rdf-wg/2011May/0062.html Juan Sequeda +1-575-SEQ-UEDA www.juansequeda.com On Tue, Jun 14, 2011 at 11:29 AM, Enrico Franconi <franconi@inf.unibz.it>wrote: > > On 14 Jun 2011, at 18:05, Juan Sequeda wrote: > > On Tue, Jun 14, 2011 at 10:39 AM, Enrico Franconi <franconi@inf.unibz.it>wrote: > >> I don't have an answer, and I'd be happy to work offline with you and >> Marcelo on this; I am working on the semantics of normative SQL null values >> since 8 months. >> Obviously, I do have an answer in the case you materialise the NULLs - you >> still didn't say why you don't like my proposal. >> > > As I previously mentioned, most if not all databases will have null values, > so your proposal would not allow the direct mapping to be used. > > > I meant: my proposal to materialise the NULL values. So I agree with your > view below. > --e. > > > The following is my personal view and vision: I believe that the direct > mapping will crucial for the RDF/semantic web and data integration world for > the following reasons: > > 1) Instead of a blank R2RML file, you will first direct map your database > to generate a pre-populated R2RML file. D2R does this. And so does Revelytix > (if I'm not wrong). They may or may not follow the direct mapping standard, > but the general idea is to have *a* direct mapping so the author of the > R2RML file doesn't start blank > 2) People may have really nice modeled databases, or only want to export a > few tables to RDF. Even though this is not part of the group's charter, I > foresee people implementing Direct Mapping+ where you choose which tables > you want to direct map. At the end, all you need to do is a string > subsitition from the automatically generated labels (ex:name) to the labels > that you really want (foaf:name) > 3) If you are comfortable with semantic web technologies, you can direct > map your database and then customize the output with SPARQL construct and/or > RIF > 4) If you are comfortable with the database, and have access to create > views and have the Direct Mapping+, you can create the views for the data > that you want to export and then only direct map. > > These are just 4 possible use cases that I see of the direct mapping. It > was very encouraging to talk to people at Semtech about RDB2RDF because they > are starting to realize benefits of RDF and they are now thinking a bit > ahead of themselves: "if I'm going to use RDF for data integration... how do > I get my rdb data into RDF then?". > > So... let's not kill the direct mapping please. > > >> But let's do it offline. >> > > Sounds good. > >> --e. >> >> >> On 14 Jun 2011, at 17:26, Juan Sequeda <juanfederico@gmail.com> wrote: >> >> >> On Jun 14, 2011, at 10:19 AM, Enrico Franconi < <franconi@inf.unibz.it> >> franconi@inf.unibz.it> wrote: >> >> I am now seeing something concrete, finally. >> >> >> Thanks. I've actually had this email in draft for a while. >> >> This is actually part of the paper that Marcelo Arenas, Dan Miranker and I >> are writing. >> >> We need to double check the generality of the approach. >> >> >> Now we can work on this together. I'm sure we can kind the correct >> solution instead of coming up with a dramatic proposal :) >> >> For example: what if you have a query asking just for the values of an >> attribute which may contain NULL values (so you don't output the id)? >> And: how do you solve my query (c) in the wiki? It seems to me that you >> need also to have a notion of the schema of the answer set. >> These are the kind of questions I'd like to see answered :-) >> >> >> I need to relook at this. Do you have an answer? >> >> Enrico, do you think we can answer your questions over the phone in the >> next hour? Otherwise, I propose that we skip this topic on today's call so >> we can have progress on other issues. It looks like we can have more >> progress over email? Is this ok with you? >> >> If so, would this be ok with the chairs? >> >> Btw, I'll be in Chile next week with Marcelo and we could have a separate >> call with interested parties to address this particular issue. My only >> concern is to get things done for the sept 1 deadline >> >> Looking forward to your comments >> >> >> On 14 Jun 2011, at 15:39, Juan Sequeda < <juanfederico@gmail.com><juanfederico@gmail.com> >> juanfederico@gmail.com> wrote: >> >> Why? >> >> Because, IMO, this is what the general RDF audience want, and we should >> create a standard that people are going to use. That is why I disagree with >> your proposal Enrico. We can't simply state that the direct mapping is not >> applicable 95% of the time. Then we have wasted 2 years of work on the >> direct mapping. >> >> Our task is to bridge the gap and make sure that everything works.. and I >> believe that everything will work. The main concern is that if we do not map >> the NULLs, our mapping will not be information preserving. In other words, >> "how to rebuild the correct answers with explicit NULLS using the direct >> mapping" So let me break this down. I believe information preserving holds >> the following way: >> >> Let S be a relational schema and Q a relational query over S. Then there >> exists a sparql query Q* such that every instance I of S: >> >> T(Q(S,I)) = Q*(M(S,I)) >> >> Any relational query Q can be broken down into a set identity query which >> is essentially SELECT * FROM table, for all tables that are part of the >> query. This identiy relational query is equal to the following sparql query: >> >> (?x A1 ?Ai) OPT ... OPT (?x An ?An)) >> >> were Ai is every attribute of table. This is where the schema comes in. We >> need to know all the attributes that are part of each table so we can build >> this sparql query. >> >> So now what we are missing are the Nulls. >> >> In the sparql query Q*, the solution mapping does not output nulls. But >> the result of the relational query Q does output nulls. This is where >> function T comes in. Function T maps a relational query output to a sparql >> solution mapping... and all this function does is "not output the nulls". >> Given that we have the schema we can reconstruct the nulls. For example, if >> I have the following: >> >> Q(S,I) = {id = 1, age, null} >> >> Then >> >> T(Q(S,I)) = {id = 1} >> >> This is going to be equal to the sparql solution mapping. If we want to >> reverse this, T', given the schema we know that the attributes consist of >> "id" and "age" and because the solution mapping only consist of "id", then >> for all the missing attributes, they are mapped to null. >> >> In conclusion, we need to be explicit about this function T and state what >> it does. My proposal is that in the direct mapping we have this function T >> which maps to null value to "nothing" and T' will map the missing attributes >> to null. I believe that with my proposal, everything should work. >> >> Enrico, where am I wrong? >> >> >> Juan Sequeda >> +1-575-SEQ-UEDA >> <http://www.juansequeda.com/> <http://www.juansequeda.com/><http://www.juansequeda.com/> >> www.juansequeda.com >> >> >> On Mon, Jun 13, 2011 at 3:05 PM, Enrico Franconi <<franconi@inf.unibz.it><franconi@inf.unibz.it><franconi@inf.unibz.it> >> franconi@inf.unibz.it> wrote: >> >>> I have the impression that people are considering the presence of >>> explicit NULL values in the data and in the answers as "polluting". In RDBs >>> NULLs are everywhere, in the data and in the answers, since day one. You >>> don't have an option not to see them in the data or in the answer. They are >>> just there, and they have a specific meaning and behaviour (which is the >>> same in Oracle, M$-SQL-server, etc). Why in mapping RDBs to RDF graphs you >>> want to hide them as if the are bearing a chronic disease? And by doing >>> that, why you want to hamper the possibility to keep in the RDF graph the >>> same behaviour (and meaning) NULLs had in the original RDB? >>> --e. >>> >> >> > >
Received on Tuesday, 14 June 2011 16:37:20 UTC