Re: Information Preserving

Juan, All,


Just a procedural comment: if you post (especially when you open a new  
thread) please mention the related issue somewhere (subject or text or  
both) so that the tracker can, well, keep track of it ;)

Tracker this is related to ISSUE-41.

Cheers,
	Michael
--
Dr. Michael Hausenblas, Research Fellow
LiDRC - Linked Data Research Centre
DERI - Digital Enterprise Research Institute
NUIG - National University of Ireland, Galway
Ireland, Europe
Tel. +353 91 495730
http://linkeddata.deri.ie/
http://sw-app.org/about.html

On 17 May 2011, at 19:01, Juan Sequeda wrote:

> Group,
>
> By information preserving, I mean that given the RDF data, I can  
> reconstruct the relational table with all its values. Informally,  
> given an identity SQL query (a query that outputs the whole table:  
> SELECT * FROM table), there exist a SPARQL query which is executed  
> on the RDF data and will return the same results of the identity SQL  
> query.
>
> There are two cases for information preserving
>
> 1) We have knowledge the schema
>
> If the relational schema is directly mapped to RDFS/OWL, then we DO  
> NOT need to translate nulls in order to preserve information. For  
> example, consider the table R with attributes A and B and instances:
>
> R(Bob, NULL)
> R(Alice, 25)
>
>
> The ontology from this schema is
>
> <R> <type> <class>
> <A> <type> <property>
> <A> <domain> <R>
> <A> <range> <whatever datatype>
> <B> <type> <property>
> <B> <domain> <R>
> <B> <range> <whatever datatype>
>
> And the RDF data, without translating nulls:
>
> <row1> <R#A> "Bob"
> <row2> <R#A> "Alice"
> <row2> <R#B> "25"
>
> The identity SQL query is
>
> SELECT A, B FROM R
>
> Given that we know the schema, we can construct a SPARQL query:
>
> SELECT ?a ?b
> WHERE{
> ?x <R#A> ?a
> OPTIONAL{
>  ?x <R#B> ?B
> }
> }
>
> There we go... with that SPARQL query, we can reconstruct the the  
> original relational table. No need of nulls. If we did triples for  
> NULL values, then the SPARQL query wouldn't have OPTIONALS. The  
> issue here is that we don't need triples for NULL values.
>
> 2) We don't have knowledge of the schema
>
> If we do not have knowledge of the schema, then we can't create a  
> SPARQL query like the previous example. Just imagine that you can  
> only look at the RDF data. For example, consider the following RDF:
>
> <row1> <R#A> "Bob"
> <row2> <R#A> "Alice"
> <row2> <R#B> "25"
>
>
> Given that one of the row 2 has <R#B> and row 1 doesn't, I could  
> guess that the value of row 1 for attribute B is null. But what if  
> the original table has a column C and every single row has a NULL  
> value for that column. In this case, it would be necessary to  
> explicitly translate NULL values into an RDF triple. Otherwise, then  
> the mapping would not be information preserving.
>
>
> CONCLUSION:
>
> - At this moment, neither the Direct Mapping or R2RML consider the  
> schema, therefore in order for the mappings to be Information  
> Preserving we must explicitly translate NULL values to an RDF triple.
> - We need to figure out how is this triple going to show up?
> - From a theoretical side, if we do not generate triples for NULL  
> values, them mapping monotonic. On the other hand, generating  
> triples for NULL values will make the mapping non-monotonic. Do we  
> care? Not really. But implementation and performance-wise, there can  
> be some overhead when dealing with non-monotonicity
>
>
> Juan Sequeda
> +1-575-SEQ-UEDA
> www.juansequeda.com

Received on Tuesday, 17 May 2011 18:16:20 UTC