On semantics-based approaches and still using full vendor-specific SQL from Harry Halpin on 2010-07-23 (public-rdb2rdf-wg@w3.org from July 2010)

From: Harry Halpin <hhalpin@w3.org>
Date: Fri, 23 Jul 2010 19:23:10 +0100 (BST)
To: public-rdb2rdf-wg@w3.org
Message-ID: <bc86a589da765d832cc3e4bd420826f3.squirrel@webmail-mit.w3.org>

I spent most the day with the Database Research Group here at Edinburgh,
who kindly managed to read most of the proposals on the table. So, I'm
going to try to channel the results of the discussion to the group.

One way is to use a purely SQL-based approach (which I hope Souri will be
present the week after this one) that allows the mapping to be done as a
view (that is isomorphic to the triples) using the full expressivity of
SQL. Then a very simple mapping construct can map the results of this SQL
to a graph, i.e. by generating URIs.

Another way is a purely SQL-based approach, but then expect the mapping
language to provide a few easy-to-use  basic constructs besides just
generating URIs in order to do common tasks, i.e. create new nodes etc. I
think this is the approach that Marcelo and Juan have been advocating for.

Now, I think these two approaches are compatible, as long as the few
easy-to-use basic constructs can be limited to a sensible amount that can
be translated into SQL and they do *not* preclude using full-vendor
specific SQL to create the mapping as well, i.e. in a view.  This makes
sense, as SQL itself can be viewed using Datalog semantics.

Furthermore, people that are SQL wizarde, these basic constructs may
not be necessary, but some people may find them (particularly people from
an RDF background) easier to use than doing everything in pure SQL. So,
Marcelo and Juan's approach this does not necessarily limit the
expressivity of SQL as long as it does preclude creating a view using full
vendor-specific SQL  before some basic mapping functions are called.

Lastly, the differences between Eric's RIF-based approach and the Datalog
approach are negligible in practice, as RIF is essentially also based on
Datalog semantics, i.e. RIF *is*  a syntax for Datalog (which does not
have its own syntax) plus some bells and whistles for extensibility.  The
argument between using Datalog or a set-theoretic semantics for mapping is
not necessary, as Datalog also has a standard set-theoretic semantics
(although we do need to get the exact semantics of what we mean by
"Datalog" down).Soeren's approach of mapping SPARQL to SQL is also useful,
and should be used as a test if there is enough time, as it still depends
on the first possibly non-trivial mapping of relational data to RDF to be
done (likely non-materialized).

Would like to hear opinions - just trying to build consensus in the group,
which despite surface differences, is actually becoming closer I think.

        cheers,
             harry

Received on Friday, 23 July 2010 18:23:13 UTC