Re: State of the art tools for rdf stream processing

Dear Javier,

Continuing the excellent summary from Peter, an important part of the tool selection is deciding what kind of stream processing you want to do:

1) Data stream processing characterized by the extraction of windows from the input stream using a stream-to-relation operator, and running queries over those windows. A typical application is the calculation of aggregate statistics (min, max, count, sum, average) over periods of time.

2) Event processing characterized by layered processing of potentially heterogeneous events. Examples in literature include stock trading, logistics (supply chain management) and computer network monitoring.

C-SPARQL, CQELS, SPARQLstream/morph-streams and Sparkwave focus on data stream processing with special extensions for window extraction. INSTANS focuses on event processing by supporting events in TriG, asynchronously interconnected query networks and intermediate storage of query results in graphs. EP-SPARQL/ETALIS implements sequence and time interval operators, but I'm unsure about layered event processing.

Data stream processing with INSTANS can be done, but you will need to write a lot more SPARQL than with the tools having built-in extensions for that purpose. On the other hand, layered event processing tasks tend to be either very awkward or altogether impossible with data stream processing tools, because window extraction limits delay performance on all levels and efforts to decrease detection delay by increasing window density force extra computations producing multiple duplicate answers which need to be filtered out.

On the specific use case of GIS, I'm not aware of any of these tools currently offering special support for geographical computations. I have tested SERVICE queries to factforge (Fig. 7<http://www.cs.hut.fi/~mjrinne/papers/odbase2014/Constructing%20Event%20Processing%20Systems%20of%20Layered%20and%20Heterogeneous%20Events%20with%20SPARQL%20(annotated%20author%20copy).pdf>), which supports e.g. omgeo:nearby into their database. INSTANS supports square root as an extension function<https://github.com/aaltodsg/instans/wiki/Extension-functions> if that helps with distance calculations. :-)

All the best to your project!

Mikko

On 17. Apr 2015, at 11:59, Wetz Peter <peter.wetz@tuwien.ac.at<mailto:peter.wetz@tuwien.ac.at>> wrote:

Dear Javier,

I’ll try to come up with a concise and (of course) subjective answer :)

First of all, it’s great to hear that you want to explore rdf streaming implementations combined with a GIS use case. I think the combination with GIS is really interesting and relevant.

To answer your question, I can give you some hints on what is my subjective impression:
C-SPARQL seems to me as quite mature in terms of rdf stream processing. It is also backed by many publications, which discuss its real-world application in different scenarios (social media monitoring, city sensing, etc.). Have a look at the webpage for more details [1]. I also got the impression that Emanuele Della Valle (initiator of C-SPARQL) is always willing to discuss issues and the like.

CQELS [2] is somewhat similar to C-SPARQL, yet, it does some things differently. It is also backed by several publications and real-world applications. I would recommend to take a look at it. Word on the street is, that there will be a new version soon-ish, which I am looking forward to.

Then there is EP-SPARQL/ETALIS which takes a more Complex Event Processing-like approach. However, I am not sure if it’s still maintained/updated. Source code [3] and several publications [4, 5] are available.

To do more namedropping, I’d like to mention some more approaches. However, I did not have any time to get my hands dirty on them, yet, so I cannot provide you with more detailed information:
SPARQLstream/morph-streams [6, 7], INSTANS [8], Sparkwave [9].

Another good place to get information on practical aspects are the tutorials given at ESWC/ISWC conferences. Luckily you can access their contents and slides [10]. I think it’s really helpful to look at the slides and get an impression of the engines’ capabilities before getting your hands on. Another good place to get information is the wiki of this very group. We collected many things there. Even though it may still appear a bit unorganized I’d recommend to take a look: [11].

One open question of yours is still the integration with OGC standards. I do not know what you mean precisely, but I think this is still a topic, which has not been quite addressed by the RSP community. I am not sure how tight of an integration with OGC standards you  imagine, but things like spatial queries are definitely doable right now.

Hope that helps!

Best regards,
Peter

[1] http://streamreasoning.org/
[2] https://code.google.com/p/cqels/
[3] https://code.google.com/p/etalis/
[4] http://iospress.metapress.com/content/t7284477156m77j1/?issue=4&genre=article&spage=397&issn=1570-0844&volume=3
[5] http://aifb.kit.edu/images/c/c0/Www29-anicic.pdf
[6] https://github.com/jpcik/morph-streams
[7] http://oa.upm.es/16330/1/corcho_enabling.pdf
[8] http://cse.aalto.fi/en/research/groups/distributed_systems/software/instans/
[9] http://sparkwave.sti2.at/index.html
[10] http://streamreasoning.org/events/sr4ld2014
[11] http://www.w3.org/community/rsp/wiki/Main_Page


--
DI (FH) Peter Wetz
PhD Candidate
Doctoral College Environmental Informatics
Vienna University of Technology
Favoritenstraße 9-11
1040 Vienna
Austria

M: +43-650-7954890
E: peter.wetz@tuwien.ac.at<mailto:peter.wetz@tuwien.ac.at>




Von: belitre@gmail.com<mailto:belitre@gmail.com> [mailto:belitre@gmail.com] Im Auftrag von Javier Ruiz Aranguren
Gesendet: Donnerstag, 16. April 2015 15:30
An: public-rsp@w3.org<mailto:public-rsp@w3.org>
Betreff: State of the art tools for rdf stream processing

Hi, all:

In the GeoSmartCity project<http://www.geosmartcity.eu/> we aim at developing a framework in which Geo Open Data can be exploited towards Smart City paradigm. One of the scenarios planned forour pilots is underground network management involving water and sewage networkmanagement<https://www.w3.org/community/rsp/wiki/Use_cases#Water_Supply_and_Sewage_Network_Management>. This includes GIS access to sensor data from Water management SCADAs and use of GIS and sensed data to improve modeling and planning of water networks.

We would like to explore an rdf streaming implementation in order to:
- be able to define continous and advanced queries.
- integrate sources, dynamic (weather) or static (type of sensors, geospatial features, etc.).
- integrate with OGC standards frictionless.

Unfortunately the number of different query languages and discontinued tools discourage a bit to follow in this direction.

I would like to ask you which tools that could accomplish this goal have ongoing development and have some traction.

Thanks.

P.D. (Will all of these previous efforts will go to bin when RSP-QL become the unique standard?)

Received on Saturday, 18 April 2015 05:41:26 UTC