- From: Ivan Mikhailov <imikhailov@openlinksw.com>
- Date: Thu, 16 Jun 2011 04:05:40 +0700
- To: public-rdb2rdf-comments@w3.org
Hi all, I've made a translator from R2RML to declarations of OpenLink Virtuoso's RDF Views. It was interesting and in some cases it was funny, because it was a nice sandbox to play with SPARQL and to push processing into SPARQL queries on R2RML resources, push as much as possible. If I were an initial developer of it but not a maintainer I'd rather write an XSLT with SPARQL injections, just to make things even more interesting. The result is not bad. It took only 900 lines of code. So the source representation is proven to be convenient, again. Examples are accurate and can be used "as is" for first tests, except trivial missing semicolon after rr:usePredicateObjectMap [ rr:usePredicateMap [ rr:predicate emp:job ]; rr:useObjectMap [ rr:column "job" ] ] in A 2.2.1, A 2.2.2 and A 2.2.3. Other minor problem hides in obsolete fig 1b --- fig 9. I'd be lazy to patch figures frequently so I'd label them "deprecated" for a while. The generated text of RDF Views is not perfect, it's rather a draft for review and for assigning some meaningful names to individual mapping rules, for readability of future error diagnostics etc. I'll probably extend source R2RMLs with rdfs:labels, comments etc. in order to make the output more readable. Further works: inverse expressions I ignore rr:inverseExpression-s, because most of rr:template-s are compiled into format strings for Virtuoso's sprintf() function and Virtuoso has sprintf_inverse() string parsing function that is smart enough to eliminate the need for 95% of handwritten URI parsing. Maybe I should detect the remaining 5% and process rr:inverseExpression-s for them, but the priority of this improvement is low. Further works: validation The open issue for me is the validation of input. The examples use only rr:TriplesMap as an explicitly declared type, types of the rest of (blank) nodes are defined implicitly as ranges of predicates in use. No doubt, that's how people will write their own R2RML resources, especially if they will write Turtle. However I'm not sure what's the best policy for validation. E.g., one may decide to create a (supposedly rr:SubjectMap) node and use it as value of both rr:useSubjectMap and rr:useObjectMap predicates in different places, should I warn about rr:graph in rr:useObjectMap after that? If types are not declared explicitly, should I first infer them and then warn about multiple types assigned to same node? Which classes are supposed to be disjoint? Right now I've sabotaged the coding of the validator, eliminating the problem, but that's not a universal solution. Further works: tests and tutorial examples OpenLink Software participates in Linking Open Data - 2 project (FP7 LOD2), and we will provide an RDF "remake" of the TPC-H benchmark, codename RDF-H. I intend to write an R2RML file that will map canonical TPC-H tables to the RDF-H graph and report if the mapping is adequate. That will be both the test for my R2RML translator and a half-real-life use case for R2RML itself. Same could be done for BSBM benchmark. Further works: release The mentioned R2RML translator will appear in Virtuoso Open Source release, so that will be one of "independent implementations" of the spec for its Candidate Recommendation phase. Best Regards, Ivan Mikhailov OpenLink Software http://virtuoso.openlinksw.com
Received on Wednesday, 15 June 2011 21:06:08 UTC