Re: slightly modified example from my rif slide... from Sebastian Hellmann on 2008-11-20 (public-xg-rdb2rdf@w3.org from November 2008)

From: Sebastian Hellmann <hellmann@informatik.uni-leipzig.de>
Date: Thu, 20 Nov 2008 17:59:52 +0100
To: "public-xg-rdb2rdf@w3.org" <public-xg-rdb2rdf@w3.org>
Message-ID: <49259788.9060600@informatik.uni-leipzig.de>

Hi all,

some comments on RIF, the Initial XG Recommendation draft and D2RQ
Lessons Learned [1] (section 3.1 People need weird mappings and 3.2 
Mapping to RDF is not enough. People want data integration).

I wondered if it made sense to redefine the term "mapping" language, so
there would be 2 different language a "from/selection"-language, which 
is concerned with selecting parts from the RDB and a 
"to/target"-language, which is concerned with assembling the RDF and 
thus would be the "mapping language".
The idea I have in mind is using a language native to the relational
data store for selecting such as SQL or Datalog and then start from
there with the mapping language to model RDF. Triplify, of course, uses
this approach, but it has some limits regarding the expressivity of the
output.

The whole mapping process could look like this:
An SQL-query yields a list of rows (or a list of named columns) as does 
"mydb:Customers( ?ID ?Name ?Phone ?Address)" from the RIF example 
(actually the SQL-query could be substituting this line ), the resulting 
columns are bound to the variables and then can be transformed by the 
standardized mapping language(the "to"-language) such as RIF: "External( 
pred:iri-string( ?T External( func:concat( "tel:" ?Phone )))"

So if there is a decoupling of the selection and the mapping, some 
things become quite easy:
1. Integration: 2 or more databases can be integrated by first designing 
the mapping and then by implementing different SQL queries for each 
database (assumed the databases are different)
2. Reuse of mappings: As the mappings now are more geared towards the 
creation of RDF, the same mappings can and should be used for different 
RDBs. So e.g. Drupal, Wordpress, Typo3 use the same mappings, but 
different SQL-queries to fill the variables (maybe some more small 
adjustments, because of encoding or other problems).
3. There could be other input languages than SQL. Maybe Xpath or so. 
Anything that fills the variables or some other interface.


Related to the XG Recommendation draft, the points "complete when 
compared to the relational algebra" and "must expose vendor specific SQL 
features" could be ticked off.

As I'm not familiar with RIF, I'm not sure if something like that can be 
incorporated.

I recently talked to Michael Martin, who transformed a relational 
database for a web application[2] (using ETL). As he needed to model 
some complex domain semantics and  (in accordance to D2RQ[1] 3.1 People 
need weird mappings) as he really needed some weird mappings, he used 
SQL and PHP as a mapping language, which is quite a powerful language 
combination. This also allowed to  correct encoding and filter some 
strange values.
 From my point of view, it would be necessary for producing a "clean" 
and good schema from an RDB to use SQL and a programming language 
(PHP/Java) as a selection language, then have it handed to the mapping 
language/processor with variables/via an interface. (I admit this last 
part is not easily realized and maybe goes too far. But as many 
evolutional databases are a mess, it would be nice to have options 
rather than workarounds around the mapping language).

Regards,
Sebastian Hellmann

[1] http://www.w3.org/2007/03/RdfRDB/papers/d2rq-positionpaper/
[2] http://www.ceur-ws.org/Vol-301/Poster_5_Martin.pdf

--
http://aksw.org/SebastianHellmann

Received on Thursday, 20 November 2008 17:00:33 UTC