Re: slightly modified example from my rif slide... from hellmann@informatik.uni-leipzig.de on 2008-11-21 (public-xg-rdb2rdf@w3.org from November 2008)

From: <hellmann@informatik.uni-leipzig.de>
Date: Fri, 21 Nov 2008 12:13:37 +0100
To: "public-xg-rdb2rdf@w3.org" <public-xg-rdb2rdf@w3.org>
Message-ID: <20081121121337.fzokv7c4bkkoc088@mail.uni-leipzig.de>
Hi,

Zitat von Axel Polleres <axel.polleres@deri.org>:
>
> Disclaimer:
>
> I invite you to send your comment also to
>   "public-rif-comments@w3.org" <public-rif-comments@w3.org>
> in order to get official feedback from the RIF working group.
>
> What I write here is my *personal* opinions.
>
> Sebastian Hellmann wrote:
>>
>> Hi all,
>>
>> some comments on RIF, the Initial XG Recommendation draft and D2RQ
>> Lessons Learned [1] (section 3.1 People need weird mappings and 3.2  
>>  Mapping to RDF is not enough. People want data integration).
>>
>> I wondered if it made sense to redefine the term "mapping" language, so
>> there would be 2 different language a "from/selection"-language,   
>> which is concerned with selecting parts from the RDB and a   
>> "to/target"-language, which is concerned with assembling the RDF   
>> and thus would be the "mapping language".
>
> I haven't seen a convincing example as of yet, where I would need this
> separation.
>
>> The idea I have in mind is using a language native to the relational
>> data store for selecting such as SQL or Datalog and then start from
>> there with the mapping language to model RDF. Triplify, of course, uses
>> this approach, but it has some limits regarding the expressivity of the
>> output.
>>
>> The whole mapping process could look like this:
>> An SQL-query yields a list of rows (or a list of named columns) as   
>> does "mydb:Customers( ?ID ?Name ?Phone ?Address)" from the RIF   
>> example
>
> There are certain things which are not - as of yet - expressible in
> RIF, such as aggregates, however, there are datalog extensions which
> cover these and which could be used for a respective rules dialect.
>
> I see no reason preventing that such rules dialect has an SQL style
> syntax as well, if people prefer that and - quite the contrary - would
> be interested in contributing there.
>
>> (actually the SQL-query could be substituting this line ), the   
>> resulting columns are bound to the variables and then can be   
>> transformed by the standardized mapping language(the "to"-language)  
>>  such as RIF: "External( pred:iri-string( ?T External( func:concat(  
>>  "tel:" ?Phone )))"
>>
>> So if there is a decoupling of the selection and the mapping, some   
>> things become quite easy:
>
> again, I don't get your point why such separation is easier, it seems
> to be just a syntax issue, not one of expressivity.

Yes, with RIF/Datalog it is actually just a syntax issue. But not with  
other mapping languages like D2RQ, which currently is one of the  
starting point languages in the XG Rec Draft. Also, I'm not quite sure  
about Datalog. Is it used by any system? Is it possible to convert  
Datalog to SQL, e.g.? Does it have any drawbacks?

I will try and give an example here:
Consider these tables (found with google):
http://framework.zend.com/manual/en/figures/zend.db.adapter.example-database.png

One could probably want it like that (all values made up):

@prefix : <http://mydb.org/> .

:bug_1 a :Bug;
     :bug_description "not working";
     :bug_status "open";
     :reported_by :someUser1;
     :assigned_to :someUser2;
     :verified_by :someUser3;
     :forProduct :wifi_module5.

:wifi_module5 a :Product.

(:bug_status "open"; could also be rdf:type :Open;)

So with SQL this would look something like

SELECT Bugs.id AS "?id",
Bugs.bug_description AS "?desc",
...,
acc1.account_name AS "?reported_by",
acc2.account_name AS "?assigned_to",
...,
Products.product_name AS "?product"
FROM Bugs, Accounts AS acc1, Accounts AS acc2 ...
WHERE (
Bugs.reported_by=acc1.account_name AND  Bugs.assigned_to=acc2.account_name
....

basically resulting in a construct like
Query(?id ?desc ... ?reported_by ?assigned_to ...)

Which can then be used by any other language for "the compositional  
part". The same can of course be achieved by RIF/Datalog, but not with  
D2RQ (maybe R2O or Virtuoso). I just  wanted to point out that: 1.  
some mapping languages have poor selection capabilities, which could  
be fixed by using SQL directly and 2. if you are aiming for  
integration, you could use the same mapping for a different database  
and just substitute the SQL query (or the related RIF part ), but name  
the columns the same, which would be an easier, modular solution  
compared to many approaches.

(one more inline comment below)

>
>> 1. Integration: 2 or more databases can be integrated by first   
>> designing the mapping and then by implementing different SQL   
>> queries for each database (assumed the databases are different)
>> 2. Reuse of mappings: As the mappings now are more geared towards   
>> the creation of RDF, the same mappings can and should be used for   
>> different RDBs. So e.g. Drupal, Wordpress, Typo3 use the same   
>> mappings, but different SQL-queries to fill the variables (maybe   
>> some more small adjustments, because of encoding or other problems).
>> 3. There could be other input languages than SQL. Maybe Xpath or   
>> so. Anything that fills the variables or some other interface.
>
> yes, in fact I presented a symbiosis of XQuery and SPARQL for the XML
> to RDF mapping already to the group, cf. http://xsparql.deri.org
>
>
>> Related to the XG Recommendation draft, the points "complete when   
>> compared to the relational algebra" and "must expose vendor   
>> specific SQL features" could be ticked off.
>>
>> As I'm not familiar with RIF, I'm not sure if something like that   
>> can be incorporated.
>>
>> I recently talked to Michael Martin, who transformed a relational   
>> database for a web application[2] (using ETL). As he needed to   
>> model some complex domain semantics and  (in accordance to D2RQ[1]   
>> 3.1 People need weird mappings) as he really needed some weird   
>> mappings, he used SQL and PHP as a mapping language, which is quite  
>>  a powerful language combination. This also allowed to  correct   
>> encoding and filter some strange values.
>
> I what you say is that the "to/target"-language, let's call it "the
> compositional part" needs to be a procedural, turing-complete language,
> then I would - personally - rather suggest something like the extending
> the XSPARQL approach to incorporate XML in the body. Xquery is
> Turing-complete so it can express all that PHP/Java can.
>
> nevertheless, What I say is that such a language would only be means to
> *exchange* mappings in a semantically unambiguous way. However you
> really implement them is a different issue.

I meant it differently. "the compositional part" should be  
standardized, RIF is fine for example. But before it comes to RIF  
Predicates (such as Query(?id ?desc ... ?reported_by ?assigned_to ...)  
or mydb:Customers( ?ID ?Name ?Phone ?Address) ), there should be  
freedom of choice (SQL, Datalog, own, PHP, whatever), so to say to  
prepare the rows/ select the input data.


>
>> From my point of view, it would be necessary for producing a   
>> "clean" and good schema from an RDB to use SQL and a programming   
>> language (PHP/Java) as a selection language, then have it handed to  
>>  the mapping language/processor with variables/via an interface. (I  
>>  admit this last part is not easily realized and maybe goes too  
>> far.  But as many evolutional databases are a mess, it would be  
>> nice to  have options rather than workarounds around the mapping  
>> language).
>>
>> Regards,
>> Sebastian Hellmann
>>
>> [1] http://www.w3.org/2007/03/RdfRDB/papers/d2rq-positionpaper/
>> [2] http://www.ceur-ws.org/Vol-301/Poster_5_Martin.pdf
>>
>> -- 
>> http://aksw.org/SebastianHellmann
>
>
> -- 
> Dr. Axel Polleres
> Digital Enterprise Research Institute, National University of Ireland, Galway
> email: axel.polleres@deri.org  url: http://www.polleres.net/



----------------------------------------------------------------
This message was sent using IMP, the Internet Messaging Program.
Received on Friday, 21 November 2008 11:14:25 UTC