The state of R2ML from Tomasz Pluskiewicz on 2020-07-31 (public-kg-construct@w3.org from July 2020)

From: Tomasz Pluskiewicz <tomasz@t-code.pl>
Date: Fri, 31 Jul 2020 21:17:59 +0200
To: public-kg-construct@w3.org
Message-ID: <etPan.5f246e6c.9055eeb.173ba@t-code.pl>

Hello

Last week I published a slightly emotional blog post on the state of R2RML and available implementations (ie. those I could easily find) [1].

Bottom line, I found that despite the RDB -> RDF bridge and/or conversion would be in my opinion a of topmost importance for transitioning legacy software in brownfield projects, it appears that the subject in general is rather neglected. Granted, big players implement all kinds of integration in their integrated products but simply converting a small-to-medium SQL database to triples as a one time job is not an easy task.

In my blog post I look at some implementations and I found that most are not on par with other technology stacks in terms of ease of use and quality documentation if you compare with much of the software industry has accustomed us to. I selected a handful which look most promising in terms of active development and uncomplicated setup to evaluate further. I also decided to revive an old .NET project I started back in 2012/2014 to bring it up to my "modern expectations" [2].

On top of that I am slightly disappointed with the R2RML spec, or rather the lack in maintenance. The original Implementation Report [3] has not been updated since 2012 and to be honest, with the original authors having moved to greener pastures, I would think that contributing using the W3C process in no longer viable. Also the original repository of test cases uses Mercurial and is not frozen as Mercurial repositories are not longer supported by W3C. This paints a rather grim picture for newcomers looking into the R2RML and similar technologies.

It took me the last week to prepare what I would like to share with you today and invite you to contribute to a refreshed, and more collaborative, implementation report. I have just today deployed a first incarnation of a fully dynamic table [4], which for starters presents my refreshed .NET implementation. I intend to add the promising implementations I mention on my blog post but if anyone has the bandwidth to beat me to it that I will be more that happy to accept pull requests.

For obvious reasons at the moment the report only presents the R2RML test cases (forked into the same GitHub repository). I also started a boilerplate repository to help others (and myself) to run evaluate tools [5]. Finally, if anyone already has a test runner which produces the EARL reports, they can simply add their reports published online so that they are fetched. Because this is not a static page, I plan to also add some dynamic features such as filtering on implementation language and tested database engines.

Best,
Tom

[1]: https://t-code.pl/blog/2020/07/rdf-struggling-case-of-r2rml/
[2]: http://r2rml.net/
[3]: https://www.w3.org/TR/rdb2rdf-implementations/
[4]: https://implementations.r2rml.net
[5]: https://github.com/r2rml4net/test-runner

Received on Friday, 31 July 2020 19:18:20 UTC