Re: Information Preserving and ISSUE-42 from Juan Sequeda on 2011-05-18 (public-rdb2rdf-wg@w3.org from May 2011)

From: Juan Sequeda <juanfederico@gmail.com>
Date: Wed, 18 May 2011 07:32:54 -0500
To: Alexandre Bertails <bertails@w3.org>
Cc: Richard Cyganiak <richard@cyganiak.de>, public-rdb2rdf-wg@w3.org
Message-ID: <BANLkTinc8JYCOmhNapbxhAyVRRhtaRERxg@mail.gmail.com>
Alexandre,

Please see [1] for an example.

[1] http://lists.w3.org/Archives/Public/public-rdb2rdf-wg/2011May/0049.html

Juan Sequeda
+1-575-SEQ-UEDA
www.juansequeda.com


On Wed, May 18, 2011 at 6:51 AM, Alexandre Bertails <bertails@w3.org> wrote:

> On Wed, 2011-05-18 at 12:07 +0100, Richard Cyganiak wrote:
> > Hi Juan,
> >
> > On 18 May 2011, at 05:44, Juan Sequeda wrote:
> > > IF the direct mapping has knowledge of the schema then translating
> NULLs is not necessary for information preserving.
> >
> > Yes.
>
> What do you guys mean by "the direct mapping has knowledge of the
> schema"?
>
> Alexandre.
>
>
>
>
> >
> > > However, the direct mapping as it is in its current version does not
> consider the schema at all.
> >
> > Correct.
> >
> > > It would be information preserving as-is, if we were to also translate
> NULLs.
> >
> > And this is wrong. For the direct mapping to be information preserving,
> we'd have to be able to reconstruct the schema of an EMPTY TABLE after the
> table is translated to RDF via the direct mapping. But an empty table
> produces NO TRIPLES, and from no triples you cannot reconstruct the original
> relational table!
> >
> > > My proposal would be to extend the direct mapping to consider the
> schema and translate it to RDFS/OWL. But I would like to know what other
> think.
> >
> > But can you capture all of the semantics of the SQL model? PKs, FKs, data
> types, nullability,
> > multiset semantics and so on? Or are you suggesting to do just the
> minimal RDFS domain/range thing?
> >
> > Best,
> > Richard
> >
> >
> >
> > >
> > >
> > > Best,
> > > Richard
> > >
> > >
> > > On 17 May 2011, at 19:01, Juan Sequeda wrote:
> > >
> > > > Group,
> > > >
> > > > By information preserving, I mean that given the RDF data, I can
> reconstruct the relational table with all its values. Informally, given an
> identity SQL query (a query that outputs the whole table: SELECT * FROM
> table), there exist a SPARQL query which is executed on the RDF data and
> will return the same results of the identity SQL query.
> > > >
> > > > There are two cases for information preserving
> > > >
> > > > 1) We have knowledge the schema
> > > >
> > > > If the relational schema is directly mapped to RDFS/OWL, then we DO
> NOT need to translate nulls in order to preserve information. For example,
> consider the table R with attributes A and B and instances:
> > > >
> > > > R(Bob, NULL)
> > > > R(Alice, 25)
> > > >
> > > >
> > > > The ontology from this schema is
> > > >
> > > > <R> <type> <class>
> > > > <A> <type> <property>
> > > > <A> <domain> <R>
> > > > <A> <range> <whatever datatype>
> > > > <B> <type> <property>
> > > > <B> <domain> <R>
> > > > <B> <range> <whatever datatype>
> > > >
> > > > And the RDF data, without translating nulls:
> > > >
> > > > <row1> <R#A> "Bob"
> > > > <row2> <R#A> "Alice"
> > > > <row2> <R#B> "25"
> > > >
> > > > The identity SQL query is
> > > >
> > > > SELECT A, B FROM R
> > > >
> > > > Given that we know the schema, we can construct a SPARQL query:
> > > >
> > > > SELECT ?a ?b
> > > > WHERE{
> > > > ?x <R#A> ?a
> > > > OPTIONAL{
> > > >  ?x <R#B> ?B
> > > > }
> > > > }
> > > >
> > > > There we go... with that SPARQL query, we can reconstruct the the
> original relational table. No need of nulls. If we did triples for NULL
> values, then the SPARQL query wouldn't have OPTIONALS. The issue here is
> that we don't need triples for NULL values.
> > > >
> > > > 2) We don't have knowledge of the schema
> > > >
> > > > If we do not have knowledge of the schema, then we can't create a
> SPARQL query like the previous example. Just imagine that you can only look
> at the RDF data. For example, consider the following RDF:
> > > >
> > > > <row1> <R#A> "Bob"
> > > > <row2> <R#A> "Alice"
> > > > <row2> <R#B> "25"
> > > >
> > > >
> > > > Given that one of the row 2 has <R#B> and row 1 doesn't, I could
> guess that the value of row 1 for attribute B is null. But what if the
> original table has a column C and every single row has a NULL value for that
> column. In this case, it would be necessary to explicitly translate NULL
> values into an RDF triple. Otherwise, then the mapping would not be
> information preserving.
> > > >
> > > >
> > > > CONCLUSION:
> > > >
> > > > - At this moment, neither the Direct Mapping or R2RML consider the
> schema, therefore in order for the mappings to be Information Preserving we
> must explicitly translate NULL values to an RDF triple.
> > > > - We need to figure out how is this triple going to show up?
> > > > - From a theoretical side, if we do not generate triples for NULL
> values, them mapping monotonic. On the other hand, generating triples for
> NULL values will make the mapping non-monotonic. Do we care? Not really. But
> implementation and performance-wise, there can be some overhead when dealing
> with non-monotonicity
> > > >
> > > >
> > > > Juan Sequeda
> > > > +1-575-SEQ-UEDA
> > > > www.juansequeda.com
> > >
> > >
> >
> >
> >
>
>
>
Received on Wednesday, 18 May 2011 12:33:52 UTC