Re: C-SPARQL interest and questions from Emanuele Della Valle on 2014-10-09 (public-rsp@w3.org from October 2014)

From: Emanuele Della Valle <emanuele.dellavalle@polimi.it>
Date: Thu, 9 Oct 2014 19:10:51 +0000
To: Mark Feblowitz <MarkFeblowitz@comcast.net>
CC: Emanuele Della Valle <emanuele.dellavalle@polimi.it>, Andy Seaborne <andy@apache.org>, Marco Balduini <marco.balduini@polimi.it>, "public-rsp@w3.org" <public-rsp@w3.org>
Message-ID: <B63F7C76-64FF-4268-B001-FF1AE189D6ED@polimi.it>

Hi Mark,

please find my answers inline.

@RSP community, I found some of Mark's questions about the C-SPARQL Engine of general interest, thus I though to do something useful in CC the mailing list in provinding my answers.

On 08 Oct 2014, at 21:16, Mark Feblowitz <MarkFeblowitz@comcast.net<mailto:MarkFeblowitz@comcast.net>> wrote:

Hello, Emanuele (and Marco) -

Thank you for responding to my question regarding the "Spurious reports of InvalidPropertyURIException on seemingly good URIs.”

As I mentioned in my reply, Andy had it right that the angle brackets ( < > ) needed to be stripped. I have code to do that, but it was accidentally corrupted.

I have been using C-SPARQL for a few months and it has been quite useful. As one of the founding members of the IBM InfoSphere Streams (IBM Streams) team, I am quite familiar with stream analysis. And being a longtime user of semantic technologies I was quite pleased to find such an elegant tryst among the two technologies :)

I am aware of the Dublin Extensions work done by your colleague Simone Tallevi-Diotallevi (now at IBM in Dublin) to build a similar capability on IBM Streams. While this work is highly relevant, I continue to use the more complete C-SPARQL implementation.

The goal of the C-SPARQL Engine is to be a reference implementation for the C-SPARQL Language. We did many naive choices, but if you submit triples at a low enough rate the engine should always do what you expect from it.

I have a few questions about C-SPARQL, some mundane, some obvious, and some a little more advanced.

let’s see if I can answer them all :-)

First, the obvious: do you know of any efforts to maintain and evolve C-SPARQL.

Marco and I are continually maintain and evolving it. In the next days, we will release a new version with enhanced support for naive stream reasoning support and background knowledge updates. All the code in on github (https://github.com/streamreasoning/) and it is opensource.

I find it quite complete and usable, although at times I must do a good bit of trial-and-error to get some queries to parse and to run.

Please share those problems with Marco and me. We will be pleased to help you. The parser we released in May 2014 should fully support SPARQL 1.1, but bugs may be present.

It would be good to know if there will be any further releases an to be able to ask some of my questions. Is there an active C-SPARQL community? I’d love to be able to post my more frequently asked questions.

As you saw from me CCing public-rsp@w3.org<mailto:public-rsp@w3.org>, there is an active RDF Stream Processing (RSP) community at W3C. I invite you to write about C-SPARQL engine problems and asking for help there. There are a number of active users of the C-SPARQL Engine and most of them are involved in the RSP community. Moreover, if I do not mistake, github also has facilities to report bugs and propose changes. Do so!

As I mentioned in my reply, I was wondering whether the C-SPARQL Ready-to-go Pack version 0.9 supports use of a Fuseki Endpoint Server to access at-rest data. I have tried using the Fuseki Endpoint URI

http://localhost:3031/km4sp/sparqi

where “km4sp” is the database name and “sparql” is the defined query service. This is the outcome, when the C-SPARQL query attempts to use that:

[ERROR] StatementResultServiceImpl - Unexpected exception invoking listener update method on listener class 'QueryListener' : HttpException : 404 - Service Description: /km4sp/sparql <org.apache.jena.atlas.web.HttpException: 404 - Service Description: /km4sp/sparql>org.apache.jena.atlas.web.HttpException: 404 - Service Description: /km4sp/sparql
at org.apache.jena.riot.web.HttpOp.exec(HttpOp.java:1052)
at org.apache.jena.riot.web.HttpOp.execHttpGet(HttpOp.java:320)
at org.apache.jena.riot.web.HttpOp.execHttpGet(HttpOp.java:355)
at org.apache.jena.riot.stream.LocatorHTTP.performOpen(LocatorHTTP.java:41)
at org.apache.jena.riot.stream.LocatorURL.open(LocatorURL.java:45)
at org.apache.jena.riot.stream.StreamManager.openNoMapOrNull(StreamManager.java:139)
at org.apache.jena.riot.stream.StreamManager.open(StreamManager.java:100)
at org.apache.jena.riot.adapters.RDFReaderRIOT_Web.read(RDFReaderRIOT_Web.java:70)
at com.hp.hpl.jena.rdf.model.impl.ModelCom.read(ModelCom.java:241)
at eu.larkc.csparql.sparql.jena.JenaEngine.evaluateQuery(JenaEngine.java:187)
at eu.larkc.csparql.core.engine.CsparqlEngineImpl.update(CsparqlEngineImpl.java:291)
at eu.larkc.csparql.core.engine.CsparqlEngineImpl.update(CsparqlEngineImpl.java:48)

Direct execution of a SPARQL query against that URI from within jena (using com.hp.hpl.jena.query.Query QueryFactory queries) yield results without error.

I’m not sure I understand what you’re tying to do. The upcoming release of the C-SPARQL engine includes support for changing the background knowledge using a SPARQL update endpoint managed using fuseki. In two weeks from now you will see appearing on http://streamreasoning.org/events/sr4ld2014 all the slides of the ISWC 2014 tutorial on Stream Reasoning. The hands-on session will include a part on updating the background knowledge issuing SPARQL update queries to the builtin fuseki endpoint.

Marko and I also realised a set of REST services to interact with a generic RSP engine and we provide an implementation of those services for the C-SPARQL Engine. Please check out http://streamreasoning.org/download/rsp-service4csparql This is also an open source project. It is hosted on github, too. The tutorial mentioned above uses the RSP REST service.

On to the next question: it would seem that one would need to be quite careful to segregate information between the streaming data and the at-rest data.

indeed :-(

The streaming data, if recorded for provenance purposes, would need to be kept in a separate endpoint from the other at-rest data to be queried, or in a separate namespace or, if named graphs were functioning properly, in a separate named graph. I’m thinking that the query engine could otherwise satisfy a query with historical data, whenever the query is processed. Are there other ways for the query to distinguish the new from the historical?

no. This was a design decision. All information in the FROM clauses is merged in the default graph. The C-SPARQL language foresee the possibility of using NAMED streams and NAMED graphs, but the engine does not support them. We have this enhancement on our agenda, but it is a deep change that we have been postponing for 3 years :-(

I would like to understand some specific semantics of windowed operations in C-SPARQL. Is there a reference that explains when tuples are ejected from the window, for each type of window?

the results are computed at window close, i.e., when the window slides or tumbles. The logical windows work fine. The triple based windows are buggy, sorry :-(

Would I find the behavior by reading the Esper documentation?

no, we only support one of the many behaviours you can obtain from ESPER, so reading about all the others does not help.

Would it be the obvious:

* in a tumbling window all are ejected at the tumble and for a sliding window,
* all tuples in the “slide” portion of the are ejected at the slide (based on timestamp)?

yes as long as you do not overload the system.

How about the processing criteria?

* Is the query processed for each tuple arrival?

no, at window close

* Would all remaining tuples in the windows be considered only when processing the incoming tuple? How about when the slide occurs?

the full content of the window is processed when the windows close. In the case of sliding windows, the contents of two subsequent windows overlap and so do the results.

* For a count-based query, e.g., could a non-matching incoming tuple trigger an evaluation where the count would be achieved, simply by having enough matching tuples in the window?

It should not. But be aware that the count has a “strange” behaviour. If you inject 10 times the same triple, the counting of the matches of such triple will return 1. This is because we consider new triples to shade old ones if they have the same subject, predicate, object. This is consistent with the fact that the C-SPARQL engine outputs result using only the R-Stream operator.

And finally. what is the purpose and the behavior of the "COMPUTED EVERY” clause in the query spec? What relationship does it have to the query processing?

When we designed the C-SPARQL Engine we decided to include this clause to alter the default behaviour of computing a result every time a window closes. When you have multiple windows, it is sometimes useful to limit the number of reports. The C-SPARQL Engine implementation ignores this clause. In the slides of the tutorial mentioned above you can see that we no longer tell the existence of the COMPUTED EVERY clause.

Thank for your time. I’d appreciate anything you or your colleagues could share.

Thank you for showing interest for the C-SPARQL Engine and for RSP processing in general.

Best Regards,

Emanuele

Received on Thursday, 9 October 2014 19:11:26 UTC