Re: Information Preserving and ISSUE-42 from Ivan Herman on 2011-05-18 (public-rdb2rdf-wg@w3.org from May 2011)

From: Ivan Herman <ivan@w3.org>
Date: Wed, 18 May 2011 08:50:32 +0200
To: Juan Sequeda <juanfederico@gmail.com>
Cc: Richard Cyganiak <richard@cyganiak.de>, public-rdb2rdf-wg@w3.org
Message-Id: <561BE283-9DA7-463A-83EA-F3A9F52CD41F@w3.org>
On May 18, 2011, at 06:59 , Juan Sequeda wrote:

> 
> > My proposal would be to extend the direct mapping to consider the schema and translate it to RDFS/OWL. But I would like to know what other think.
> 
> The beauty of the direct mapping is its simplicity. You look at a table instance, you, sort of, 'know' what the direct mapping will generate and then you can massage the result if you want. Adding RDFs/OWL and the schema in the equation would jeopardize that.
> 
> As I said in the previous mail, defining an rdb2rdf:NULL URI for that case makes it clear, it does not impose any RDF semantics on that case (which is really an RDB feature), and still lets the user massage the results with that case in mind.
> 
> So does that mean that you would translate all NULL values to a triple with rdb2rdf:NULL? That makes sense and would make the current direct mapping information preserving. 
> 

That would be my favourite, yes.


> However, I have the impression that the community in general is not going to like that [1]
> 
> [1] http://lists.w3.org/Archives/Public/public-rdb2rdf-wg/2011May/0062.html

I would not commit suicide if this was the final consensus either:-)

Ivan


> 
> 
> 
> Ivan
> 
> 
> >
> >
> > Best,
> > Richard
> >
> >
> > On 17 May 2011, at 19:01, Juan Sequeda wrote:
> >
> > > Group,
> > >
> > > By information preserving, I mean that given the RDF data, I can reconstruct the relational table with all its values. Informally, given an identity SQL query (a query that outputs the whole table: SELECT * FROM table), there exist a SPARQL query which is executed on the RDF data and will return the same results of the identity SQL query.
> > >
> > > There are two cases for information preserving
> > >
> > > 1) We have knowledge the schema
> > >
> > > If the relational schema is directly mapped to RDFS/OWL, then we DO NOT need to translate nulls in order to preserve information. For example, consider the table R with attributes A and B and instances:
> > >
> > > R(Bob, NULL)
> > > R(Alice, 25)
> > >
> > >
> > > The ontology from this schema is
> > >
> > > <R> <type> <class>
> > > <A> <type> <property>
> > > <A> <domain> <R>
> > > <A> <range> <whatever datatype>
> > > <B> <type> <property>
> > > <B> <domain> <R>
> > > <B> <range> <whatever datatype>
> > >
> > > And the RDF data, without translating nulls:
> > >
> > > <row1> <R#A> "Bob"
> > > <row2> <R#A> "Alice"
> > > <row2> <R#B> "25"
> > >
> > > The identity SQL query is
> > >
> > > SELECT A, B FROM R
> > >
> > > Given that we know the schema, we can construct a SPARQL query:
> > >
> > > SELECT ?a ?b
> > > WHERE{
> > > ?x <R#A> ?a
> > > OPTIONAL{
> > >  ?x <R#B> ?B
> > > }
> > > }
> > >
> > > There we go... with that SPARQL query, we can reconstruct the the original relational table. No need of nulls. If we did triples for NULL values, then the SPARQL query wouldn't have OPTIONALS. The issue here is that we don't need triples for NULL values.
> > >
> > > 2) We don't have knowledge of the schema
> > >
> > > If we do not have knowledge of the schema, then we can't create a SPARQL query like the previous example. Just imagine that you can only look at the RDF data. For example, consider the following RDF:
> > >
> > > <row1> <R#A> "Bob"
> > > <row2> <R#A> "Alice"
> > > <row2> <R#B> "25"
> > >
> > >
> > > Given that one of the row 2 has <R#B> and row 1 doesn't, I could guess that the value of row 1 for attribute B is null. But what if the original table has a column C and every single row has a NULL value for that column. In this case, it would be necessary to explicitly translate NULL values into an RDF triple. Otherwise, then the mapping would not be information preserving.
> > >
> > >
> > > CONCLUSION:
> > >
> > > - At this moment, neither the Direct Mapping or R2RML consider the schema, therefore in order for the mappings to be Information Preserving we must explicitly translate NULL values to an RDF triple.
> > > - We need to figure out how is this triple going to show up?
> > > - From a theoretical side, if we do not generate triples for NULL values, them mapping monotonic. On the other hand, generating triples for NULL values will make the mapping non-monotonic. Do we care? Not really. But implementation and performance-wise, there can be some overhead when dealing with non-monotonicity
> > >
> > >
> > > Juan Sequeda
> > > +1-575-SEQ-UEDA
> > > www.juansequeda.com
> >
> >
> 
> 
> ----
> Ivan Herman, W3C Semantic Web Activity Lead
> Home: http://www.w3.org/People/Ivan/
> mobile: +31-641044153
> PGP Key: http://www.ivan-herman.net/pgpkey.html
> FOAF: http://www.ivan-herman.net/foaf.rdf
> 
> 
> 
> 
> 
> 


----
Ivan Herman, W3C Semantic Web Activity Lead
Home: http://www.w3.org/People/Ivan/
mobile: +31-641044153
PGP Key: http://www.ivan-herman.net/pgpkey.html
FOAF: http://www.ivan-herman.net/foaf.rdf
Received on Wednesday, 18 May 2011 06:48:46 UTC