- From: Claus Stadler <cstadler@informatik.uni-leipzig.de>
- Date: Fri, 07 Jun 2013 12:05:58 +0200
- To: public-lod@w3.org
Hi, I am the creator of Sparqlify[1], a SPARQL to SQL rewriter, which we are developing and using for publishing the relational OpenStreetMap[2] database as RDF in the course of the LinkedGeoData (LGD) project[3], an thus currently serves 20 billion virtual triples. But so far we have applied the tool successfully to other databases (Wortschatz, PanLex) and numerous CSV files on CKAN (see [4]) as well. Currently, the latest snapshot of Sparqlify is packed automatically on successful build (this includes testing against the R2RML test suite) as a Debian package at [5]. This Deb contains the scripts 'sparqlify' and 'sparqlify-csv', whereas the former is for databases (tested with Postgres and H2, but not mysql yet) whereas the latter is for csv files. Another script / war file, that bundles a Linked Data inteface and the HTML SPARQL interface will follow shortly. Anectodal evidence by myself and students of mine suggests that the used mapping language SML (Sparqlification Mapping Language) is pretty straight forward to use and in regard to expressivity essentially equivalent to R2RML (except for a current lack of support of inverse expressions). I recommend to look at the mappings of LinkedGeoData [6] and judge for yourself. R2RML <-> SML conversion support is underway, but will take probably about a couple of months before release. I started writing down the documentation of SML at [7]. An official Debian package will become part of the LOD2 stack[8] this month. So if anyone is interested in trying out Sparqlify, feedback and suggestions for improvements are much welcome (please use the Github issue tracker for any bugs) ;) Cheers, Claus Here are a quick example for LGD: - Number of triples contributer by user 666: http://linkedgeodata.org/vsnorql/?query=Select+%28Count%28*%29+As+%3Fc%29+{%0D%0A++++%3Fs+dcterms%3Acontributor+lgd%3Auser666+.%0D%0A++++%3Fs+%3Fp+%3Fo+.%0D%0A} - A nice feature is the EXLPAIN keyword: It helps one to review the generated SQL and spot performance bottlenecks. http://linkedgeodata.org/vsnorql/?query=Explain+Select+*+{%0D%0A++++%3Fs+dcterms%3Acontributor+lgd%3Auser666+.%0D%0A++++%3Fs+%3Fp+%3Fo+.%0D%0A} [1] https://github.com/AKSW/Sparqlify [2] http://www.openstreetmap.org/ [3] http://www.linkedgeodata.org/ [4] http://ld.panlex.org/rdf.html http://wiki.publicdata.eu/wiki/CSV2RDF_Application [5] http://cstadler.aksw.org/repos/apt/pool/main/s/sparqlify/ [6] https://github.com/GeoKnow/LinkedGeoData/blob/master/linkedgeodata-core/src/main/sparqlify/LinkedGeoData-Triplify-IndividualViews.sparqlify [7] http://sparqlify.org/wiki/SML [8] http://stack.lod2.eu/ On 05/28/2013 10:18 AM, Luca Matteis wrote: > Here's my scenario: I have several different datasets. Most in MySQL > databases. Some in PostrgreSQL. Others in MS Access. Many in CSV. Each > one of these datasets is maintained by its own group of people. > > Now, my end goal is to have all these datasets published as 5 stars > Linked Open Data. But I am in doubt between these two solutions: > > 1) Give a generic wrapper tool to each of these groups of people, that > would basically convert their datasets to RDF, and allow them to > publish this data as LOD automatically. This tool would allow them to > publish LOD on their own, using their own server (does such a generic > tool even exist? Can it even be built?). > > 2) Scrape these datasets, which are at times simply published on the > Web as HTML paginated tables, or published as dumps on their server, > for example a .CSV dump of their entire database. Then I would > aggregate all these various datasets myself, and publish them as > Linked Data. > > Pros and cons for each of these methods? Any other ideas? > > Thanks! -- Dipl. Inf. Claus Stadler Department of Computer Science, University of Leipzig Research Group:http://aksw.org/ Workpage & WebID:http://aksw.org/ClausStadler Phone: +49 341 97-32260
Received on Friday, 7 June 2013 10:06:29 UTC