RE: State of the art tools for rdf stream processing

Hi,
Nice analysis. Although there are other systems that may also be considered (ep-sparql and Instans for example)
>From your requirements I would say that you could take a look at either cqels or csparql, you should be able to achieve what you want. 
Sparqlstream needs a bit of shepherding to get it to work and the static RDF support may be problematic.
R4 seems achievable with all systems, although sparqlstream has more explicit examples with it.
R5 is not supported by anyone. Believe me, mixing space and time semantics in a query language is not trivial. But I think it is nice material for a new paper (there are some journal special issues cfps circulating, if anyone wants to give it a try...)
What we have done to solve this is usually to cheat, precomputing spatial regions in an RDF 'static' repository, and then use the RSP query engines on top.
For R6, we are usually doomed to use some sort of wrapper and convert the incoming streams to RDF streams. It's a dirty job that is difficult to avoid.

About RSP-QL, and what is the foundation of it... Very tricky question. But it really draws heavily from all systems we mentioned: windows from csparql/cqels, stream datasets form csparlq, named streams from cqels, *stream operators from sparqlstream, even some ideas from ep-sparql. So it is really a mixture. And in some cases a break from all previous systems (e.g. graph based stream model).
The good thing is that if you already use one of these older systems, rspql will most likely be very similar, and on the surface most of the changes will be minor (e.g. syntax on window declarations, etc).
After Eswc I hope we can finish on the rspql discussion and then I would expect implementations to emerge. As in any query system, it is at the implementation level that you can put extra-effort in optimizations, and additional features. For instance some implementations may opt for adding the spatial support for the language. Nothing prevents implementations to go beyond anything that rspql defines.

cheers,
Jean-Paul 


Date: Tue, 21 Apr 2015 10:32:00 +0200
From: jruizaranguren@gmail.com
To: mikko.rinne@aalto.fi
CC: public-rsp@w3.org
Subject: Re: State of the art tools for rdf stream processing


Thank you all, @Wetz, @Rinne.After reviewing your links I think I can specify better my desired requirements. I write them down and add a brief analysis (please correct me If I'm wrong).
Requirements:
1. Data stream kind of processing: I'm ok with windows and simple aggregate functions.  (C-SPARQL, CQELS, SparqlStream)2. Background RDF access (C-SPARQL, CQELS).3. Be able to cross link or layer streams (C-SPARQL, CQELS, SparqlStream).4. Ontology querying using SSN (SparqlStream)5. Spatial filtering: just bounding box or named location, nothing fancy (so regular SPARQL might suffice).6. Able to integrate with current SCADA (via RDBMS).
Analysis:
=> None of the approaches covers 1-6 requirements.=> It seems all approaches rely on an existing DSMS in order to execute the queries. Functionality is limited by underlying DSMS. morph-streams provide perhaps a happier path to integrate with new sources (RDBMS), but I still have to look at the code to see it this is feasible or it would be very costly.=> SparqlStraem+morph-streams does not support access to background rdf, which is basic to our purposes of demoing integration of data.=> Not clear which approach will be the foundation of RSP-QL.
Do you think that ESWC2015 will change this situation substantially?
2015-04-18 7:40 GMT+02:00 Rinne Mikko <mikko.rinne@aalto.fi>:








Dear Javier,



Continuing the excellent summary from Peter, an important part of the tool selection is deciding what kind of stream processing you want to do:



1) Data stream processing characterized by the extraction of windows from the input stream using a stream-to-relation operator, and running queries over those windows. A typical application is the calculation of aggregate statistics (min, max, count, sum,
 average) over periods of time.



2) Event processing characterized by layered processing of potentially heterogeneous events. Examples in literature include stock trading, logistics (supply chain management) and computer network monitoring.



C-SPARQL, CQELS, SPARQLstream/morph-streams and Sparkwave focus on data stream processing with special extensions for window extraction. INSTANS focuses on event processing by supporting events in TriG, asynchronously interconnected query networks and
 intermediate storage of query results in graphs. EP-SPARQL/ETALIS implements sequence and time interval operators, but I'm unsure about layered event processing.



Data stream processing with INSTANS can be done, but you will need to write a lot more SPARQL than with the tools having built-in extensions for that purpose. On the other hand, layered event processing tasks tend to be either very awkward or altogether
 impossible with data stream processing tools, because window extraction limits delay performance on all levels and efforts to decrease detection delay by increasing window density force extra computations producing multiple duplicate answers which need to
 be filtered out.



On the specific use case of GIS, I'm not aware of any of these tools currently offering special support for geographical computations. I have tested SERVICE queries to factforge (Fig.
 7), which supports e.g. omgeo:nearby into their database. INSTANS supports square root as an extension function if that helps with distance calculations. :-)



All the best to your project!



Mikko



On 17. Apr 2015, at 11:59, Wetz Peter <peter.wetz@tuwien.ac.at> wrote:






Dear Javier,

 

I’ll try to come up with a concise and (of course) subjective answer :)

 

First of all, it’s great to hear that you want to explore rdf streaming implementations combined with a GIS use case. I think the combination with GIS is really interesting
 and relevant.

 

To answer your question, I can give you some hints on what is my subjective impression:

C-SPARQL seems to me as quite mature in terms of rdf stream processing. It is also backed by many publications, which discuss its real-world application in different scenarios
 (social media monitoring, city sensing, etc.). Have a look at the webpage for more details [1]. I also got the impression that Emanuele Della Valle (initiator of C-SPARQL) is always willing to discuss issues and the like.

 

CQELS [2] is somewhat similar to C-SPARQL, yet, it does some things differently. It is also backed by several publications and real-world applications. I would recommend
 to take a look at it. Word on the street is, that there will be a new version soon-ish, which I am looking forward to.

 

Then there is EP-SPARQL/ETALIS which takes a more Complex Event Processing-like approach. However, I am not sure if it’s still maintained/updated. Source code [3] and
 several publications [4, 5] are available.

 

To do more namedropping, I’d like to mention some more approaches. However, I did not have any time to get my hands dirty on them, yet, so I cannot provide you with more
 detailed information:

SPARQLstream/morph-streams [6, 7], INSTANS [8], Sparkwave [9].

 

Another good place to get information on practical aspects are the tutorials given at ESWC/ISWC conferences. Luckily you can access their contents and slides [10]. I think
 it’s really helpful to look at the slides and get an impression of the engines’ capabilities before getting your hands on. Another good place to get information is the wiki of this very group. We collected many things there. Even though it may still appear
 a bit unorganized I’d recommend to take a look: [11].

 

One open question of yours is still the integration with OGC standards. I do not know what you mean precisely, but I think this is still a topic, which has not been quite
 addressed by the RSP community. I am not sure how tight of an integration with OGC standards you  imagine, but things like spatial queries are definitely doable right now.

 

Hope that helps!

 

Best regards,

Peter

 

[1] http://streamreasoning.org/

[2] https://code.google.com/p/cqels/

[3] https://code.google.com/p/etalis/

[4] http://iospress.metapress.com/content/t7284477156m77j1/?issue=4&genre=article&spage=397&issn=1570-0844&volume=3

[5] http://aifb.kit.edu/images/c/c0/Www29-anicic.pdf

[6] https://github.com/jpcik/morph-streams

[7] http://oa.upm.es/16330/1/corcho_enabling.pdf

[8] http://cse.aalto.fi/en/research/groups/distributed_systems/software/instans/

[9] http://sparkwave.sti2.at/index.html

[10] http://streamreasoning.org/events/sr4ld2014

[11] http://www.w3.org/community/rsp/wiki/Main_Page

 

 

--

DI (FH) Peter Wetz

PhD Candidate

Doctoral College Environmental Informatics

Vienna University of Technology

Favoritenstraße 9-11

1040 Vienna

Austria



M: +43-650-7954890

E: peter.wetz@tuwien.ac.at

 

 

 

 




Von: belitre@gmail.com [mailto:belitre@gmail.com] Im
 Auftrag von Javier Ruiz Aranguren

Gesendet: Donnerstag, 16. April 2015 15:30

An: public-rsp@w3.org

Betreff: State of the art tools for rdf stream processing



 


Hi, all:

 

In the GeoSmartCity
 project we aim at developing a framework in which Geo Open Data can be exploited towards Smart City paradigm. One
 of the scenarios planned forour pilots is underground network management involving water and sewage networkmanagement. This includes GIS
 access to sensor data from Water management SCADAs and use of GIS and sensed data to improve modeling and planning of water networks.

 

We would like to explore an rdf streaming implementation in order to:

- be able to define continous and advanced queries.

- integrate sources, dynamic (weather) or static (type of sensors, geospatial features, etc.).

- integrate with OGC standards frictionless.

 

Unfortunately the number of different query languages and discontinued tools discourage a bit to follow in this direction.

 

I would like to ask you which tools that could accomplish this goal have ongoing development and have some traction.

 

Thanks.

 

P.D. (Will all of these previous efforts will go to bin when RSP-QL become the unique standard?)











 		 	   		  

Received on Tuesday, 21 April 2015 09:32:31 UTC