- From: Antoine Zimmermann <antoine.zimmermann@emse.fr>
- Date: Tue, 18 Jul 2017 17:52:01 +0200
- To: public-rax@w3.org
- Cc: Maxime Lefrançois <maxime.lefrancois@emse.fr>
Dear RAX group, I have not participated in the discussions of this group, silently observing the emails of the mailing list. Yet, I have something that could be of interest to you. My colleague Maxime (in CC) and I have worked on making a language for expressing a transformation from documents in any format to RDF, with the idea that the language should be as close as possible to SPARQL (thus can be implemented easily by extending a SPARQL engine). The result is SPARQL generate [1,2,3]. It works for generating RDF from any source format, so it's applicable to XML in particular (we do not deal with RDF-to-XML generation, though). Here is roughly how it works: - You use XPath to specify the pieces of the source document you want to extract and bind the result to a SPARQL variable. This is done by implementing a SPARQL custom function, thus making use of the standard extension mechanism of SPARQL + the standard binding mechanism of SPARQL. - Since this is bound to a variable, the result of the XPath selection can be used within a standard SPARQL expression. - We allow extraction from multiple files at the same time, so information from several sources can be crossed, combined, processed, etc. and the result is bound to a variable using a BIND clause. - The variables are then injected into a graph pattern in a similar way as for CONSTRUCT queries. However, we do not reuse the CONSTRUCT clause because we allow nested graph pattern generation. Since the language extends SPARQL and the implementation extends a SPARQL engine, it is possible to include the XML extraction inside a normal SPARQL query pattern over a triple store (or over multiple triple stores with the SERVICE clause). The tool is not limited to XML-to-RDF generation. Any combination of formats can be used as source files, thanks to a number of custom functions: JSON-path for JSON or CBOR, CSS selectors for HTML/XML, regex selectors for arbitrary text files, date and time conversion functions, and more. The web site [1] provides an online interface for testing, many examples and test cases of various levels of complexity, a command line tool in the form of an executable jar, the source code of our implementation (extending Jena) and a little documentation/tutorial (to be improved). We are working on improvements: syntactic sugar to make writing queries much easier and support for data streams. If you are interested in further information, please contact us. If you are using it, please let us know! We are of course eager to know who our user base is composed of. Regards, --AZ [1] SPARQL generate official web site: http://ci.emse.fr/sparql-generate/ [2] Maxime Lefrançois, Antoine Zimmermann, Noorani Bakerally. Flexible RDF generation from RDF and heterogeneous data sources with SPARQL-Generate, In Proc. the 20th International Conference on Knowledge Engineering and Knowledge Management, EKAW, Nov 2016, Bologna, Italy (demo track). http://www.maxime-lefrancois.info/docs/LefrancoisZimmermannBakerally-EKAW2016-Flexible.pdf [3] Maxime Lefrançois, Antoine Zimmermann, Noorani Bakerally. Maxime Lefrançois, Antoine Zimmermann, Noorani Bakerally A SPARQL extension for generating RDF from heterogeneous formats, In Proc. Extended Semantic Web Conference, ESWC, May 2017, Portoroz, Slovenia. http://www.maxime-lefrancois.info/docs/LefrancoisZimmermannBakerally-ESWC2017-Generate.pdf -- Antoine Zimmermann Institut Henri Fayol École des Mines de Saint-Étienne 158 cours Fauriel CS 62362 42023 Saint-Étienne Cedex 2 France Tél:+33(0)4 77 42 66 03 Fax:+33(0)4 77 42 66 66 http://www.emse.fr/~zimmermann/ Member of team Connected Intelligence, Laboratoire Hubert Curien
Received on Tuesday, 18 July 2017 15:52:29 UTC