Re: Information Preserving and ISSUE-42 from Juan Sequeda on 2011-05-18 (public-rdb2rdf-wg@w3.org from May 2011)

From: Juan Sequeda <juanfederico@gmail.com>
Date: Tue, 17 May 2011 23:59:06 -0500
To: Ivan Herman <ivan@w3.org>
Cc: Richard Cyganiak <richard@cyganiak.de>, public-rdb2rdf-wg@w3.org
Message-ID: <BANLkTikgLNyijyECcFVo74h1vM9n0mNLWA@mail.gmail.com>
>
>
> > My proposal would be to extend the direct mapping to consider the schema
> and translate it to RDFS/OWL. But I would like to know what other think.
>
> The beauty of the direct mapping is its simplicity. You look at a table
> instance, you, sort of, 'know' what the direct mapping will generate and
> then you can massage the result if you want. Adding RDFs/OWL and the schema
> in the equation would jeopardize that.
>
> As I said in the previous mail, defining an rdb2rdf:NULL URI for that case
> makes it clear, it does not impose any RDF semantics on that case (which is
> really an RDB feature), and still lets the user massage the results with
> that case in mind.
>

So does that mean that you would translate all NULL values to a triple
with rdb2rdf:NULL? That makes sense and would make the current direct
mapping information preserving.

However, I have the impression that the community in general is not going to
like that [1]

[1] http://lists.w3.org/Archives/Public/public-rdb2rdf-wg/2011May/0062.html



> Ivan
>
>
> >
> >
> > Best,
> > Richard
> >
> >
> > On 17 May 2011, at 19:01, Juan Sequeda wrote:
> >
> > > Group,
> > >
> > > By information preserving, I mean that given the RDF data, I can
> reconstruct the relational table with all its values. Informally, given an
> identity SQL query (a query that outputs the whole table: SELECT * FROM
> table), there exist a SPARQL query which is executed on the RDF data and
> will return the same results of the identity SQL query.
> > >
> > > There are two cases for information preserving
> > >
> > > 1) We have knowledge the schema
> > >
> > > If the relational schema is directly mapped to RDFS/OWL, then we DO NOT
> need to translate nulls in order to preserve information. For example,
> consider the table R with attributes A and B and instances:
> > >
> > > R(Bob, NULL)
> > > R(Alice, 25)
> > >
> > >
> > > The ontology from this schema is
> > >
> > > <R> <type> <class>
> > > <A> <type> <property>
> > > <A> <domain> <R>
> > > <A> <range> <whatever datatype>
> > > <B> <type> <property>
> > > <B> <domain> <R>
> > > <B> <range> <whatever datatype>
> > >
> > > And the RDF data, without translating nulls:
> > >
> > > <row1> <R#A> "Bob"
> > > <row2> <R#A> "Alice"
> > > <row2> <R#B> "25"
> > >
> > > The identity SQL query is
> > >
> > > SELECT A, B FROM R
> > >
> > > Given that we know the schema, we can construct a SPARQL query:
> > >
> > > SELECT ?a ?b
> > > WHERE{
> > > ?x <R#A> ?a
> > > OPTIONAL{
> > >  ?x <R#B> ?B
> > > }
> > > }
> > >
> > > There we go... with that SPARQL query, we can reconstruct the the
> original relational table. No need of nulls. If we did triples for NULL
> values, then the SPARQL query wouldn't have OPTIONALS. The issue here is
> that we don't need triples for NULL values.
> > >
> > > 2) We don't have knowledge of the schema
> > >
> > > If we do not have knowledge of the schema, then we can't create a
> SPARQL query like the previous example. Just imagine that you can only look
> at the RDF data. For example, consider the following RDF:
> > >
> > > <row1> <R#A> "Bob"
> > > <row2> <R#A> "Alice"
> > > <row2> <R#B> "25"
> > >
> > >
> > > Given that one of the row 2 has <R#B> and row 1 doesn't, I could guess
> that the value of row 1 for attribute B is null. But what if the original
> table has a column C and every single row has a NULL value for that column.
> In this case, it would be necessary to explicitly translate NULL values into
> an RDF triple. Otherwise, then the mapping would not be information
> preserving.
> > >
> > >
> > > CONCLUSION:
> > >
> > > - At this moment, neither the Direct Mapping or R2RML consider the
> schema, therefore in order for the mappings to be Information Preserving we
> must explicitly translate NULL values to an RDF triple.
> > > - We need to figure out how is this triple going to show up?
> > > - From a theoretical side, if we do not generate triples for NULL
> values, them mapping monotonic. On the other hand, generating triples for
> NULL values will make the mapping non-monotonic. Do we care? Not really. But
> implementation and performance-wise, there can be some overhead when dealing
> with non-monotonicity
> > >
> > >
> > > Juan Sequeda
> > > +1-575-SEQ-UEDA
> > > www.juansequeda.com
> >
> >
>
>
> ----
> Ivan Herman, W3C Semantic Web Activity Lead
> Home: http://www.w3.org/People/Ivan/
> mobile: +31-641044153
> PGP Key: http://www.ivan-herman.net/pgpkey.html
> FOAF: http://www.ivan-herman.net/foaf.rdf
>
>
>
>
>
>
Received on Wednesday, 18 May 2011 04:59:56 UTC