W3C home > Mailing lists > Public > public-rsp@w3.org > April 2015

Re: State of the art tools for rdf stream processing

From: Javier Ruiz Aranguren <jruizaranguren@gmail.com>
Date: Tue, 21 Apr 2015 11:43:05 +0200
Message-ID: <CAG_3m4JoKYZ3wraGMjRRbMF+BzV-ZrBBAP7FacYJTzD2kR3-0g@mail.gmail.com>
To: Jean Paul Calbimonte <jpcalbimonte@delicias.dia.fi.upm.es>
Cc: "public-rsp@w3.org" <public-rsp@w3.org>
@Jean-Paul, just a small clarification about R5. I was assuming that the
spatial query would be made over static background model (simple bounding
box or linked URI can suffice). I'm not interested in further spatial
reasoning so far (although it is nice to have this off the self in OGC-SOS
implementations such as 52North one).

2015-04-21 11:32 GMT+02:00 Jean Paul Calbimonte <
jpcalbimonte@delicias.dia.fi.upm.es>:

> Hi,
> Nice analysis. Although there are other systems that may also be
> considered (ep-sparql and Instans for example)
> From your requirements I would say that you could take a look at either
> cqels or csparql, you should be able to achieve what you want.
> Sparqlstream needs a bit of shepherding to get it to work and the static
> RDF support may be problematic.
> R4 seems achievable with all systems, although sparqlstream has more
> explicit examples with it.
> R5 is not supported by anyone. Believe me, mixing space and time semantics
> in a query language is not trivial. But I think it is nice material for a
> new paper (there are some journal special issues cfps circulating, if
> anyone wants to give it a try...)
> What we have done to solve this is usually to cheat, precomputing spatial
> regions in an RDF 'static' repository, and then use the RSP query engines
> on top.
> For R6, we are usually doomed to use some sort of wrapper and convert the
> incoming streams to RDF streams. It's a dirty job that is difficult to
> avoid.
>
> About RSP-QL, and what is the foundation of it... Very tricky question.
> But it really draws heavily from all systems we mentioned: windows from
> csparql/cqels, stream datasets form csparlq, named streams from cqels,
> *stream operators from sparqlstream, even some ideas from ep-sparql. So it
> is really a mixture. And in some cases a break from all previous systems
> (e.g. graph based stream model).
> The good thing is that if you already use one of these older systems,
> rspql will most likely be very similar, and on the surface most of the
> changes will be minor (e.g. syntax on window declarations, etc).
> After Eswc I hope we can finish on the rspql discussion and then I would
> expect implementations to emerge. As in any query system, it is at the
> implementation level that you can put extra-effort in optimizations, and
> additional features. For instance some implementations may opt for adding
> the spatial support for the language. Nothing prevents implementations to
> go beyond anything that rspql defines.
>
> cheers,
> Jean-Paul
>
>
> ------------------------------
> Date: Tue, 21 Apr 2015 10:32:00 +0200
> From: jruizaranguren@gmail.com
> To: mikko.rinne@aalto.fi
> CC: public-rsp@w3.org
> Subject: Re: State of the art tools for rdf stream processing
>
>
>
> Thank you all, @Wetz, @Rinne.
> After reviewing your links I think I can specify better my desired
> requirements. I write them down and add a brief analysis (please correct me
> If I'm wrong).
>
> Requirements:
>
> 1. Data stream kind of processing: I'm ok with windows and simple
> aggregate functions.  (C-SPARQL, CQELS, SparqlStream)
> 2. Background RDF access (C-SPARQL, CQELS).
> 3. Be able to cross link or layer streams (C-SPARQL, CQELS, SparqlStream).
> 4. Ontology querying using SSN (SparqlStream)
> 5. Spatial filtering: just bounding box or named location, nothing fancy
> (so regular SPARQL might suffice).
> 6. Able to integrate with current SCADA (via RDBMS).
>
> Analysis:
>
> => None of the approaches covers 1-6 requirements.
> => It seems all approaches rely on an existing DSMS in order to execute
> the queries. Functionality is limited by underlying DSMS. morph-streams
> provide perhaps a happier path to integrate with new sources (RDBMS), but I
> still have to look at the code to see it this is feasible or it would be
> very costly.
> => SparqlStraem+morph-streams does not support access to background rdf,
> which is basic to our purposes of demoing integration of data.
> => Not clear which approach will be the foundation of RSP-QL.
>
> Do you think that ESWC2015 will change this situation substantially?
>
> 2015-04-18 7:40 GMT+02:00 Rinne Mikko <mikko.rinne@aalto.fi>:
>
>
>  Dear Javier,
>
>  Continuing the excellent summary from Peter, an important part of the
> tool selection is deciding what kind of stream processing you want to do:
>
>  1) Data stream processing characterized by the extraction of windows
> from the input stream using a stream-to-relation operator, and running
> queries over those windows. A typical application is the calculation of
> aggregate statistics (min, max, count, sum, average) over periods of time.
>
>  2) Event processing characterized by layered processing of potentially
> heterogeneous events. Examples in literature include stock trading,
> logistics (supply chain management) and computer network monitoring.
>
>  C-SPARQL, CQELS, SPARQLstream/morph-streams and Sparkwave focus on data
> stream processing with special extensions for window extraction. INSTANS
> focuses on event processing by supporting events in TriG, asynchronously
> interconnected query networks and intermediate storage of query results in
> graphs. EP-SPARQL/ETALIS implements sequence and time interval operators,
> but I'm unsure about layered event processing.
>
>  Data stream processing with INSTANS can be done, but you will need to
> write a lot more SPARQL than with the tools having built-in extensions for
> that purpose. On the other hand, layered event processing tasks tend to be
> either very awkward or altogether impossible with data stream processing
> tools, because window extraction limits delay performance on all levels and
> efforts to decrease detection delay by increasing window density force
> extra computations producing multiple duplicate answers which need to be
> filtered out.
>
>  On the specific use case of GIS, I'm not aware of any of these tools
> currently offering special support for geographical computations. I have
> tested SERVICE queries to factforge (Fig. 7
> <http://www.cs.hut.fi/~mjrinne/papers/odbase2014/Constructing%20Event%20Processing%20Systems%20of%20Layered%20and%20Heterogeneous%20Events%20with%20SPARQL%20%28annotated%20author%20copy%29.pdf>),
> which supports e.g. omgeo:nearby into their database. INSTANS supports
> square root as an extension function
> <https://github.com/aaltodsg/instans/wiki/Extension-functions> if that
> helps with distance calculations. :-)
>
>  All the best to your project!
>
>  Mikko
>
>  On 17. Apr 2015, at 11:59, Wetz Peter <peter.wetz@tuwien.ac.at> wrote:
>
>   Dear Javier,
>
>  I’ll try to come up with a concise and (of course) subjective answer :)
>
>  First of all, it’s great to hear that you want to explore rdf streaming
> implementations combined with a GIS use case. I think the combination with
> GIS is really interesting and relevant.
>
>  To answer your question, I can give you some hints on what is my
> subjective impression:
>  C-SPARQL seems to me as quite mature in terms of rdf stream processing.
> It is also backed by many publications, which discuss its real-world
> application in different scenarios (social media monitoring, city sensing,
> etc.). Have a look at the webpage for more details [1]. I also got the
> impression that Emanuele Della Valle (initiator of C-SPARQL) is always
> willing to discuss issues and the like.
>
>  CQELS [2] is somewhat similar to C-SPARQL, yet, it does some things
> differently. It is also backed by several publications and real-world
> applications. I would recommend to take a look at it. Word on the street
> is, that there will be a new version soon-ish, which I am looking forward
> to.
>
>  Then there is EP-SPARQL/ETALIS which takes a more Complex Event
> Processing-like approach. However, I am not sure if it’s still
> maintained/updated. Source code [3] and several publications [4, 5] are
> available.
>
>  To do more namedropping, I’d like to mention some more approaches.
> However, I did not have any time to get my hands dirty on them, yet, so I
> cannot provide you with more detailed information:
>  SPARQLstream/morph-streams [6, 7], INSTANS [8], Sparkwave [9].
>
>  Another good place to get information on practical aspects are the
> tutorials given at ESWC/ISWC conferences. Luckily you can access their
> contents and slides [10]. I think it’s really helpful to look at the slides
> and get an impression of the engines’ capabilities before getting your
> hands on. Another good place to get information is the wiki of this very
> group. We collected many things there. Even though it may still appear a
> bit unorganized I’d recommend to take a look: [11].
>
>  One open question of yours is still the integration with OGC standards.
> I do not know what you mean precisely, but I think this is still a topic,
> which has not been quite addressed by the RSP community. I am not sure how
> tight of an integration with OGC standards you  imagine, but things like
> spatial queries are definitely doable right now.
>
>  Hope that helps!
>
>  Best regards,
>  Peter
>
>  [1] http://streamreasoning.org/
>  [2] https://code.google.com/p/cqels/
>  [3] https://code.google.com/p/etalis/
>  [4]
> http://iospress.metapress.com/content/t7284477156m77j1/?issue=4&genre=article&spage=397&issn=1570-0844&volume=3
>  [5] http://aifb.kit.edu/images/c/c0/Www29-anicic.pdf
>  [6] https://github.com/jpcik/morph-streams
>  [7] http://oa.upm.es/16330/1/corcho_enabling.pdf
>  [8]
> http://cse.aalto.fi/en/research/groups/distributed_systems/software/instans/
>  [9] http://sparkwave.sti2.at/index.html
>  [10] http://streamreasoning.org/events/sr4ld2014
>  [11] http://www.w3.org/community/rsp/wiki/Main_Page
>
>
>  --
>  DI (FH) Peter Wetz
> PhD Candidate
>  Doctoral College Environmental Informatics
>  Vienna University of Technology
>  Favoritenstraße 9-11
>  1040 Vienna
>  Austria
>
> M: +43-650-7954890
>  E: peter.wetz@tuwien.ac.at
>
>
>
>
>   *Von:* belitre@gmail.com [mailto:belitre@gmail.com <belitre@gmail.com>] *Im
> Auftrag von *Javier Ruiz Aranguren
> *Gesendet:* Donnerstag, 16. April 2015 15:30
> *An:* public-rsp@w3.org
> *Betreff:* State of the art tools for rdf stream processing
>
>  Hi, all:
>
>  In the GeoSmartCity project <http://www.geosmartcity.eu/> we aim at
> developing a framework in which Geo Open Data can be exploited towards
> Smart City paradigm. One of the scenarios planned forour pilots is
> underground network management involving water and sewage networkmanagement
> <https://www.w3.org/community/rsp/wiki/Use_cases#Water_Supply_and_Sewage_Network_Management>
> . This includes GIS access to sensor data from Water management SCADAs
> and use of GIS and sensed data to improve modeling and planning of water
> networks.
>
>  We would like to explore an rdf streaming implementation in order to:
>  - be able to define continous and advanced queries.
>  - integrate sources, dynamic (weather) or static (type of sensors,
> geospatial features, etc.).
>  - integrate with OGC standards frictionless.
>
>  Unfortunately the number of different query languages and discontinued
> tools discourage a bit to follow in this direction.
>
>  I would like to ask you which tools that could accomplish this goal have
> ongoing development and have some traction.
>
>  Thanks.
>
>  P.D. (Will all of these previous efforts will go to bin when RSP-QL
> become the unique standard?)
>
>
>
>
Received on Tuesday, 21 April 2015 09:43:33 UTC

This archive was generated by hypermail 2.3.1 : Tuesday, 21 April 2015 09:43:34 UTC