Re: R2RML, v 1.65 2011/06/15: implementation experience.

This is awesome, Ivan! Would be very interesting if you could report  
on this next week ...

Dr. Michael Hausenblas, Research Fellow
LiDRC - Linked Data Research Centre
DERI - Digital Enterprise Research Institute
NUIG - National University of Ireland, Galway
Ireland, Europe
Tel. +353 91 495730

On 15 Jun 2011, at 22:05, Ivan Mikhailov wrote:

> Hi all,
> I've made a translator from R2RML to declarations of OpenLink  
> Virtuoso's
> RDF Views. It was interesting and in some cases it was funny,  
> because it
> was a nice sandbox to play with SPARQL and to push processing into
> SPARQL queries on R2RML resources, push as much as possible. If I were
> an initial developer of it but not a maintainer I'd rather write an  
> with SPARQL injections, just to make things even more interesting.
> The result is not bad. It took only 900 lines of code. So the source
> representation is proven to be convenient, again.
> Examples are accurate and can be used "as is" for first tests, except
> trivial missing semicolon after
> rr:usePredicateObjectMap
>    [
>      rr:usePredicateMap [ rr:predicate emp:job ];
>      rr:useObjectMap    [ rr:column "job" ]
>    ]
> in A 2.2.1, A 2.2.2 and A 2.2.3.
> Other minor problem hides in obsolete fig 1b --- fig 9. I'd be lazy to
> patch figures frequently so I'd label them "deprecated" for a while.
> The generated text of RDF Views is not perfect, it's rather a draft  
> for
> review and for assigning some meaningful names to individual mapping
> rules, for readability of future error diagnostics etc. I'll probably
> extend source R2RMLs with rdfs:labels, comments etc. in order to make
> the output more readable.
> Further works: inverse expressions
> I ignore rr:inverseExpression-s, because most of rr:template-s are
> compiled into format strings for Virtuoso's sprintf() function and
> Virtuoso has sprintf_inverse() string parsing function that is smart
> enough to eliminate the need for 95% of handwritten URI parsing.  
> Maybe I
> should detect the remaining 5% and process rr:inverseExpression-s for
> them, but the priority of this improvement is low.
> Further works: validation
> The open issue for me is the validation of input. The examples use  
> only
> rr:TriplesMap as an explicitly declared type, types of the rest of
> (blank) nodes are defined implicitly as ranges of predicates in use.  
> No
> doubt, that's how people will write their own R2RML resources,
> especially if they will write Turtle. However I'm not sure what's the
> best policy for validation. E.g., one may decide to create a  
> (supposedly
> rr:SubjectMap) node and use it as value of both rr:useSubjectMap and
> rr:useObjectMap predicates in different places, should I warn about
> rr:graph in rr:useObjectMap after that?
> If types are not declared explicitly, should I first infer them and  
> then
> warn about multiple types assigned to same node? Which classes are
> supposed to be disjoint?
> Right now I've sabotaged the coding of the validator, eliminating the
> problem, but that's not a universal solution.
> Further works: tests and tutorial examples
> OpenLink Software participates in Linking Open Data - 2 project (FP7
> LOD2), and we will provide an RDF "remake" of the TPC-H benchmark,
> codename RDF-H. I intend to write an R2RML file that will map  
> canonical
> TPC-H tables to the RDF-H graph and report if the mapping is adequate.
> That will be both the test for my R2RML translator and a half-real- 
> life
> use case for R2RML itself. Same could be done for BSBM benchmark.
> Further works: release
> The mentioned R2RML translator will appear in Virtuoso Open Source
> release, so that will be one of "independent implementations" of the
> spec for its Candidate Recommendation phase.
> Best Regards,
> Ivan Mikhailov
> OpenLink Software

Received on Wednesday, 15 June 2011 21:08:55 UTC