The minimum we need to deliver


Here's what I think this group should deliver:

1) A mapping language R2ML according to the SQL-based approach, along  
the lines of what we see in Souri's proposal and in Sören's Triplify:  
The bulk of transformation happens in full unconstrained SQL; and the  
SQL queries are wrapped in some "glue" that specifies how each record  
in the SQL result set is turned into RDF triples. There is still a lot  
to be discussed about the syntax and exact expressivity of that  
"glue", but I believe that is the right general approach for R2ML.

2) A documented method of deriving URIs from the items in a relational  
schema (tables, columns). So that we can name these items and talk  
about them using RDF. This is helpful in itself for documenting,  
managing and analyzing relational schemas. This is mostly covered in  
Eric's direct mapping proposal (except he doesn't generate URIs to  
name tables by default).

3) A documented method of deriving a default RDF graph from the data  
inside a relational database. It seems reasonable to use the URIs from  
2) in this RDF graph, e.g., as property URIs. We already have several  
proposals that cover this point. We need this one to enable the use of  
RDF-to-RDF mapping technologies, see below.

4) Nice to have: A documented method of capturing the constraints of  
the relational schema in RDFS and OWL, to the extent possible within  
the expressivity of these languages. This amounts to providing OWL/ 
RDFS “definitions” for the URIs in 2). This is covered well by Juan's  
and Marcelo's work on “Putative Ontologies”. It is worth pointing out  
that OWL and RDFS are insufficient to capture all constraints of the  
relational schema, so these mappings are lossy.

What about the RDF-based approach?

If we break it down, there are two different things that have received  
this label.

First, there are mapping languages in the style of D2RQ (Virtuoso's  
RDF Views, R2O, and the Revelytix language fall into this category).  
These languages can be expressed in the SQL-based R2ML. They *may* not  
be able to deal with full SQL, at least not efficiently. This will  
certainly be the case for D2RQ. I'm comfortable stating in the D2RQ  
documentation that it only supports a subset of SQL, and  
characterizing this subset; or stating that one should avoid certain  
SQL constructs to keep performance up.

Second, there is the approach of using existing general RDF-to-RDF  
mapping technologies for the purpose of RDB-to-RDF translation. Eric  
has been championing this approach, and it has some appeal.

But all that we as a WG have to do to enable this approach is 3)  
above. Given a default mapping, any general RDB-to-RDF transformation  
approach, including SPARQL CONSTRUCT, RIF, SWRL, R2R, or XSLT over RDF/ 
XML, can be used to express mappings from the default mapping into  
customized RDF representations. The semantics of these transformation  
technologies is already defined in their respective specifications.  
Wether the processor chooses to implement these transforms as RDF-to- 
RDF transforms, or translates queries over the customized  
representation directly to SQL (as Eric has demonstrated for SPARQL  
CONSTRUCT), is again up to the implementer.

So, I believe that all the variants of the RDF-based approach are  
sufficiently addressed by 1), 2) and 3) above.

As far as I can tell, the four items above are sufficient to fulfill  
our charter, meet the Requirements, and are the best way of addressing  
the concerns of our main stakeholders.


Received on Wednesday, 21 July 2010 18:33:15 UTC