Re: Information Preserving and ISSUE-42

On Wed, 2011-05-18 at 07:32 -0500, Juan Sequeda wrote:
> Alexandre,
> 
> 
> Please see [1] for an example.
> 
> 
> [1]
> http://lists.w3.org/Archives/Public/public-rdb2rdf-wg/2011May/0049.html


Yes I read that. RDB2RDF means going *from* RDB *to* RDF. Information
Preserving is about going backwards, which has never been a requirement.

Anyway, it's a way more difficult than what you said in [1]. For
example, your don't say anything about the type mapping informations. If
you start dealing with that, ⊥ (ie your denoted NULL) *must become* a
subtype of any class that can be generated (in our case, any optional
value in the RDB instance). So you'll never be able to go back anyway
because you won't be able to distinguish from two ⊥.

Alexandre.

> 
> Juan Sequeda
> +1-575-SEQ-UEDA
> www.juansequeda.com
> 
> 
> On Wed, May 18, 2011 at 6:51 AM, Alexandre Bertails <bertails@w3.org>
> wrote:
>         On Wed, 2011-05-18 at 12:07 +0100, Richard Cyganiak wrote:
>         > Hi Juan,
>         >
>         > On 18 May 2011, at 05:44, Juan Sequeda wrote:
>         > > IF the direct mapping has knowledge of the schema then
>         translating NULLs is not necessary for information preserving.
>         >
>         > Yes.
>         
>         
>         What do you guys mean by "the direct mapping has knowledge of
>         the
>         schema"?
>         
>         Alexandre.
>         
>         
>         
>         
>         
>         >
>         > > However, the direct mapping as it is in its current
>         version does not consider the schema at all.
>         >
>         > Correct.
>         >
>         > > It would be information preserving as-is, if we were to
>         also translate NULLs.
>         >
>         > And this is wrong. For the direct mapping to be information
>         preserving, we'd have to be able to reconstruct the schema of
>         an EMPTY TABLE after the table is translated to RDF via the
>         direct mapping. But an empty table produces NO TRIPLES, and
>         from no triples you cannot reconstruct the original relational
>         table!
>         >
>         > > My proposal would be to extend the direct mapping to
>         consider the schema and translate it to RDFS/OWL. But I would
>         like to know what other think.
>         >
>         > But can you capture all of the semantics of the SQL model?
>         PKs, FKs, data types, nullability,
>         > multiset semantics and so on? Or are you suggesting to do
>         just the minimal RDFS domain/range thing?
>         >
>         > Best,
>         > Richard
>         >
>         >
>         >
>         > >
>         > >
>         > > Best,
>         > > Richard
>         > >
>         > >
>         > > On 17 May 2011, at 19:01, Juan Sequeda wrote:
>         > >
>         > > > Group,
>         > > >
>         > > > By information preserving, I mean that given the RDF
>         data, I can reconstruct the relational table with all its
>         values. Informally, given an identity SQL query (a query that
>         outputs the whole table: SELECT * FROM table), there exist a
>         SPARQL query which is executed on the RDF data and will return
>         the same results of the identity SQL query.
>         > > >
>         > > > There are two cases for information preserving
>         > > >
>         > > > 1) We have knowledge the schema
>         > > >
>         > > > If the relational schema is directly mapped to RDFS/OWL,
>         then we DO NOT need to translate nulls in order to preserve
>         information. For example, consider the table R with attributes
>         A and B and instances:
>         > > >
>         > > > R(Bob, NULL)
>         > > > R(Alice, 25)
>         > > >
>         > > >
>         > > > The ontology from this schema is
>         > > >
>         > > > <R> <type> <class>
>         > > > <A> <type> <property>
>         > > > <A> <domain> <R>
>         > > > <A> <range> <whatever datatype>
>         > > > <B> <type> <property>
>         > > > <B> <domain> <R>
>         > > > <B> <range> <whatever datatype>
>         > > >
>         > > > And the RDF data, without translating nulls:
>         > > >
>         > > > <row1> <R#A> "Bob"
>         > > > <row2> <R#A> "Alice"
>         > > > <row2> <R#B> "25"
>         > > >
>         > > > The identity SQL query is
>         > > >
>         > > > SELECT A, B FROM R
>         > > >
>         > > > Given that we know the schema, we can construct a SPARQL
>         query:
>         > > >
>         > > > SELECT ?a ?b
>         > > > WHERE{
>         > > > ?x <R#A> ?a
>         > > > OPTIONAL{
>         > > >  ?x <R#B> ?B
>         > > > }
>         > > > }
>         > > >
>         > > > There we go... with that SPARQL query, we can
>         reconstruct the the original relational table. No need of
>         nulls. If we did triples for NULL values, then the SPARQL
>         query wouldn't have OPTIONALS. The issue here is that we don't
>         need triples for NULL values.
>         > > >
>         > > > 2) We don't have knowledge of the schema
>         > > >
>         > > > If we do not have knowledge of the schema, then we can't
>         create a SPARQL query like the previous example. Just imagine
>         that you can only look at the RDF data. For example, consider
>         the following RDF:
>         > > >
>         > > > <row1> <R#A> "Bob"
>         > > > <row2> <R#A> "Alice"
>         > > > <row2> <R#B> "25"
>         > > >
>         > > >
>         > > > Given that one of the row 2 has <R#B> and row 1 doesn't,
>         I could guess that the value of row 1 for attribute B is null.
>         But what if the original table has a column C and every single
>         row has a NULL value for that column. In this case, it would
>         be necessary to explicitly translate NULL values into an RDF
>         triple. Otherwise, then the mapping would not be information
>         preserving.
>         > > >
>         > > >
>         > > > CONCLUSION:
>         > > >
>         > > > - At this moment, neither the Direct Mapping or R2RML
>         consider the schema, therefore in order for the mappings to be
>         Information Preserving we must explicitly translate NULL
>         values to an RDF triple.
>         > > > - We need to figure out how is this triple going to show
>         up?
>         > > > - From a theoretical side, if we do not generate triples
>         for NULL values, them mapping monotonic. On the other hand,
>         generating triples for NULL values will make the mapping
>         non-monotonic. Do we care? Not really. But implementation and
>         performance-wise, there can be some overhead when dealing with
>         non-monotonicity
>         > > >
>         > > >
>         > > > Juan Sequeda
>         > > > +1-575-SEQ-UEDA
>         > > > www.juansequeda.com
>         > >
>         > >
>         >
>         >
>         >
>         
>         
>         
> 
> 

Received on Wednesday, 18 May 2011 13:01:11 UTC