Re: C-SPARQL Engine and handling of REGISTER, nested queries from Emanuele Della Valle on 2014-10-14 (public-rsp@w3.org from October 2014)

From: Emanuele Della Valle <emanuele.dellavalle@polimi.it>
Date: Tue, 14 Oct 2014 05:24:07 +0000
To: Mark Feblowitz <MarkFeblowitz@comcast.net>
CC: Emanuele Della Valle <emanuele.dellavalle@polimi.it>, Andy Seaborne <andy@apache.org>, Marco Balduini <marco.balduini@polimi.it>, "public-rsp@w3.org" <public-rsp@w3.org>
Message-ID: <FC215DF1-66F6-4C21-8C6D-FEA97195B7BA@polimi.it>

Dear Mark,

On 13 Oct 2014, at 23:42, Mark Feblowitz <MarkFeblowitz@comcast.net<mailto:MarkFeblowitz@comcast.net>> wrote:

Emanuele, and @RSP Community -

I have some questions, based on a specific item I am trying to implement in my work at IBM Research.

My C-SPARQL questions are:

* Is there a means of performing simple filtration on each triple in a stream? (I’m thinking, PHYSICAL window of size 1)

Triple based windows are a buggy.

* Can a query to FILTER a stream be composed with a subquery that is also windowed (using different windowing criteria)?

no, as in SPARQL the FROM clause cannot appear in subqueries. You can achieve the same result by composing queries in a query network. You put the sub-query upstream to the query that contains it.

* Using the C-SPARQL engine, how does one take the output of a REGISTERed STREAM query and use it as input to another? Is there a special URL? Or must the user-defined result processor post a new stream, identifying a new URL?

You need to use the RDFStreamFormatter. See slide 9 and 10 in http://www.streamreasoning.org/slides/2013/04/corso_dott_ifp_c-sparql.pdf

You can also check out the COMPOSABILITY test in the https://github.com/streamreasoning/CSPARQL-ReadyToGoPack

* For GROUPed windowed processing, is it correct to assume that the effect is as if there is a window per group?

The window creates the dataset that you evaluate the group on.

* How in general can one express a case where only one solution is emitted per group, in a GROUP BY … HAVING query?

I’m not sure this is possible in SPARQL. If it is not possible in SPARQL it is not possible in C-SPARQL.

Are you trying to implement a partitioned window? Please check out this link http://esper.codehaus.org/tutorials/solution_patterns/solution_patterns.html#expiry-3

C-SPARQL does not support this clause, but indeed it is very useful in many cases.

* That’s one solution only, not one solution per processing pass.

Here’s the simplified scenario:

* Examine a stream of arbitrary RDF triples, looking for Infectees — Persons infected by a particular virus. These infectees are grouped by Region.
* A triple set is to be CONSTRUCTed when there are 0 < N < threshold infectees in a given region (“SomeInfectees” alert)

* Another triple set is to be CONSTRUCTed when N >= threshold (“PossibleEpidemic” alert)

this appears doable in C-SPARQL. You need to queries registered on the some stream

REGISTER STREAM someInfecteeAlert AS
CONSTRUCT { [ ] someInfecteeAlertIn ?region }
FROM STREAM …
WHERE {
?infectee a Infected .
?infectee livesIn ?region
} GROUP BY ?region
HAVING (COUNT(?infectee) > 0 && COUNT(?infectee) < %%threshold%%)

REGISTER STREAM PossibleEpidemic AS
CONSTRUCT { [ ] PossibleEpidemicIn ?region }
FROM STREAM …
WHERE {
?infectee a Infected .
?infectee livesIn ?region
} GROUP BY ?region
HAVING (COUNT(?infectee) > %%threshold%%)

As far as I know the IF function (http://www.w3.org/TR/sparql11-query/#func-if) cannot be used in a CONSTRUCT clause, otherwise it could have been possible to write just one query.

The goal here is to process an arbitrary stream of triples and to emit just a single alert - ever - per group.

So, there are two issues here:

1. filtering a stream and then applying windowed (?) match criteria for each expression for groups in the filtered result
2. ensuring that only one answer per expression per group results in a CONSTRUCT

Let me try to understand. The following query should pickup exactly one infectee per region with Possible Epidemic alert.

REGISTER QUERY infecteeInRegionWithPossibleEpidemic AS
SELECT ?region ?infectee
FROM STREAM …
WHERE {
{ SELECT ?region {
WHERE {
?infectee a Infected .
?infectee livesIn ?region
} GROUP BY ?region
HAVING (COUNT(?infectee) > %%threshold%%) }
{ SELECT ?region {
WHERE {
?infectee a Infected .
?infectee livesIn ?region } LIMIT 1 }
}

is this what you want?

As for item #1, I’ve tried a few things and now understand that I need to view this as a filter part and a aggregate or join part. I am thinking about these ways to handle this:

* register a C-SPARQL stream query (window size = 1, slide = 1) to perform the filtering, feeding it to a another (window size and slide TBD); the latter query notifies by emitting a CONSTRUCTed result.

arbitrary triple stream —> [ PHYSICAL WINDOWed FILTER ] —> filtered triple stream —> [ PHYSICAL WINDOWED JOIN and or AGGREGATE ] —> CONSTRUCTed triple stream
or

* compose a query whereby windowing is performed for the initial filtering and a subquery with separate windowing is performed for the aggregation/join (is this possible?)

In either case, the first part filters, e.g., down to a stream consisting only of infectees and the second part groups the infectees by region, counts them and emits the single respective notification. (SomeInfecteesInRegion and PotentialEpidemic). Thus, the questions above about composition of queries.

I believe I gave you already too many option. I leave this to answer once I read your answers.

As for item 2 above, the obvious question: Will "LIMIT 1" limit the query to being matched one time only (per group) or does it mean that only one CONSTRUCT will be emitted each time the processing criteria are met (that is, when the window closes)?

See my query with two subqueries above.

If it’s the former, I’m done. If the latter, I don’t see a way to my goal.

I may have misunderstood :-(

It’s easy to think of this procedurally: look at non-finite data until an expression is matched and stop there. Or in a stream-ish approach, match the expression and deduplicate the output stream. Or, less cleanly, asserting the notifications to an RDF store and then including in the join expression a check for a prior alert before emitting one? Only the last one seems obvious with C-SPARQL (albeit “dirty”).

Is there a clean way of doing this?

Let’s see. I’m curious too. Indeed this may require to extend the language and this is something I’m looking for :-)

Best Regards,

Emanuele

Received on Tuesday, 14 October 2014 05:24:39 UTC