- From: Gray, Alasdair <A.J.G.Gray@hw.ac.uk>
- Date: Fri, 14 Nov 2014 13:15:31 +0000
- To: Mark Feblowitz <MarkFeblowitz@comcast.net>
- CC: Emanuele Della Valle <emanuele.dellavalle@polimi.it>, Andy Seaborne <andy@apache.org>, Marco Balduini <marco.balduini@polimi.it>, "public-rsp@w3.org" <public-rsp@w3.org>
- Message-ID: <606E368C-72B6-442C-B4F1-88812B4CE48A@hw.ac.uk>
On 3 Nov 2014, at 17:31, Mark Feblowitz <MarkFeblowitz@comcast.net<mailto:MarkFeblowitz@comcast.net>> wrote: [snip] So, on to my wish list: First, it seems that the biggest problem is that the processing criteria (when to do the match) seem currently to be inseparable from the ejection criteria (when to eject triples from the window). Without the “COMPUTED EVERY” or some other separation of processing criteria, it seems that one must use the slide or the tumble as the processing criteria. With COMPUTED EVERY being implemented, at least one might have a shot at being able to match the first instance of, e.g., a threshold crossing, before triple ejection occurs. What would have worked better for my case is being able to specify general processing criteria separate from specifying ejection criteria, i.e., to test every second within tumbling one hour window. In general, it can be helpful to one of a number of different ways for triple arrival to trigger processing, independent of when triples are being evicted from the window. There are other examples of various combinations of processing and eviction criteria: the ability to process every minute a snapshot average of sales receipts for one hour tumbling windows of sales (that’s different to having a 10 minute slide in a one hour window). Could you illustrate this with a concrete example? Or perhaps you’d want to compute the average of each set of 20 triples in a tumbling one hour period. The point here is that there can be interesting combinations of count-based and time based triggering and/or eviction. Some combinations only make sense when you have the need to use them :-) Windows based on the count of triples is very problematic as it is non-deterministic behaviour dependent upon the order and rate of the stream. Also quite helpful would be the prospect of being able to invoke a policy of one output per match. So, rather than having each invocation of processing result in (re)producing every match, instead have some means of sensibly limiting matches. In the example above, it would make sense to have the arrival of a triple (specifically one that is included in the query) trigger the processing of the query against the window contents, but only in the context of that triple (so, “fix" that triple in the query and then match around it); the next triple would be different, and only emit a result if it, too, participated in the match. I hope this makes some sense, and that it corresponds to some vocabulary that’s familiar to you. Much of this separation of processing criteria from ejection criteria derives from experience in using Streams’ windowing capabilities. The semantics of windowed stream operators (aggregate, join, …) address various combinations of applying each. I’m looking around for good references for these windowing semantics. The most concise and technical (and least hyped) I’ve found so far is IBM InfoSphere Streams Version 3.2.1 Window handling<http://www-01.ibm.com/support/knowledgecenter/SSCRJU_3.2.1/com.ibm.swg.im.infosphere.streams.dev.doc/doc/windowhandling.html> It’s written in the context of people writing their own Windowed operators (in C++) to extend SPL (Streams Processing Language), but it captures the semantics of all of Streams’ built-in windowed operators. I hope this isn’t off-point for this forum. And I hope that at least some of this has already been addressed in your discussions. These sort of detailed requirements for windows is very much on topic and up for discussion. Best regards Alasdair Thanks for your indulgence and, again, for your helpful suggestions. Regards, Mark On Oct 14, 2014, at 1:24 AM, Emanuele Della Valle <emanuele.dellavalle@polimi.it<mailto:emanuele.dellavalle@polimi.it>> wrote: Dear Mark, On 13 Oct 2014, at 23:42, Mark Feblowitz <MarkFeblowitz@comcast.net<mailto:MarkFeblowitz@comcast.net>> wrote: Emanuele, and @RSP Community - I have some questions, based on a specific item I am trying to implement in my work at IBM Research. My C-SPARQL questions are: * Is there a means of performing simple filtration on each triple in a stream? (I’m thinking, PHYSICAL window of size 1) Triple based windows are a buggy. * Can a query to FILTER a stream be composed with a subquery that is also windowed (using different windowing criteria)? no, as in SPARQL the FROM clause cannot appear in subqueries. You can achieve the same result by composing queries in a query network. You put the sub-query upstream to the query that contains it. * Using the C-SPARQL engine, how does one take the output of a REGISTERed STREAM query and use it as input to another? Is there a special URL? Or must the user-defined result processor post a new stream, identifying a new URL? You need to use the RDFStreamFormatter. See slide 9 and 10 in http://www.streamreasoning.org/slides/2013/04/corso_dott_ifp_c-sparql.pdf You can also check out the COMPOSABILITY test in the https://github.com/streamreasoning/CSPARQL-ReadyToGoPack * For GROUPed windowed processing, is it correct to assume that the effect is as if there is a window per group? The window creates the dataset that you evaluate the group on. * How in general can one express a case where only one solution is emitted per group, in a GROUP BY … HAVING query? I’m not sure this is possible in SPARQL. If it is not possible in SPARQL it is not possible in C-SPARQL. Are you trying to implement a partitioned window? Please check out this link http://esper.codehaus.org/tutorials/solution_patterns/solution_patterns.html#expiry-3 C-SPARQL does not support this clause, but indeed it is very useful in many cases. * That’s one solution only, not one solution per processing pass. Here’s the simplified scenario: * Examine a stream of arbitrary RDF triples, looking for Infectees — Persons infected by a particular virus. These infectees are grouped by Region. * A triple set is to be CONSTRUCTed when there are 0 < N < threshold infectees in a given region (“SomeInfectees” alert) * Another triple set is to be CONSTRUCTed when N >= threshold (“PossibleEpidemic” alert) this appears doable in C-SPARQL. You need to queries registered on the some stream REGISTER STREAM someInfecteeAlert AS CONSTRUCT { [ ] someInfecteeAlertIn ?region } FROM STREAM … WHERE { ?infectee a Infected . ?infectee livesIn ?region } GROUP BY ?region HAVING (COUNT(?infectee) > 0 && COUNT(?infectee) < %%threshold%%) REGISTER STREAM PossibleEpidemic AS CONSTRUCT { [ ] PossibleEpidemicIn ?region } FROM STREAM … WHERE { ?infectee a Infected . ?infectee livesIn ?region } GROUP BY ?region HAVING (COUNT(?infectee) > %%threshold%%) As far as I know the IF function (http://www.w3.org/TR/sparql11-query/#func-if) cannot be used in a CONSTRUCT clause, otherwise it could have been possible to write just one query. The goal here is to process an arbitrary stream of triples and to emit just a single alert - ever - per group. So, there are two issues here: 1. filtering a stream and then applying windowed (?) match criteria for each expression for groups in the filtered result 2. ensuring that only one answer per expression per group results in a CONSTRUCT Let me try to understand. The following query should pickup exactly one infectee per region with Possible Epidemic alert. REGISTER QUERY infecteeInRegionWithPossibleEpidemic AS SELECT ?region ?infectee FROM STREAM … WHERE { { SELECT ?region { WHERE { ?infectee a Infected . ?infectee livesIn ?region } GROUP BY ?region HAVING (COUNT(?infectee) > %%threshold%%) } { SELECT ?region { WHERE { ?infectee a Infected . ?infectee livesIn ?region } LIMIT 1 } } is this what you want? As for item #1, I’ve tried a few things and now understand that I need to view this as a filter part and a aggregate or join part. I am thinking about these ways to handle this: * register a C-SPARQL stream query (window size = 1, slide = 1) to perform the filtering, feeding it to a another (window size and slide TBD); the latter query notifies by emitting a CONSTRUCTed result. arbitrary triple stream —> [ PHYSICAL WINDOWed FILTER ] —> filtered triple stream —> [ PHYSICAL WINDOWED JOIN and or AGGREGATE ] —> CONSTRUCTed triple stream or * compose a query whereby windowing is performed for the initial filtering and a subquery with separate windowing is performed for the aggregation/join (is this possible?) In either case, the first part filters, e.g., down to a stream consisting only of infectees and the second part groups the infectees by region, counts them and emits the single respective notification. (SomeInfecteesInRegion and PotentialEpidemic). Thus, the questions above about composition of queries. I believe I gave you already too many option. I leave this to answer once I read your answers. As for item 2 above, the obvious question: Will "LIMIT 1" limit the query to being matched one time only (per group) or does it mean that only one CONSTRUCT will be emitted each time the processing criteria are met (that is, when the window closes)? See my query with two subqueries above. If it’s the former, I’m done. If the latter, I don’t see a way to my goal. I may have misunderstood :-( It’s easy to think of this procedurally: look at non-finite data until an expression is matched and stop there. Or in a stream-ish approach, match the expression and deduplicate the output stream. Or, less cleanly, asserting the notifications to an RDF store and then including in the join expression a check for a prior alert before emitting one? Only the last one seems obvious with C-SPARQL (albeit “dirty”). Is there a clean way of doing this? Let’s see. I’m curious too. Indeed this may require to extend the language and this is something I’m looking for :-) Best Regards, Emanuele Alasdair J G Gray Lecturer in Computer Science, Heriot-Watt University, UK. Email: A.J.G.Gray@hw.ac.uk<mailto:A.J.G.Gray@hw.ac.uk> Web: http://www.alasdairjggray.co.uk ORCID: http://orcid.org/0000-0002-5711-4872 Telephone: +44 131 451 3429 Twitter: @gray_alasdair ----- We invite research leaders and ambitious early career researchers to join us in leading and driving research in key inter-disciplinary themes. Please see www.hw.ac.uk/researchleaders for further information and how to apply. Heriot-Watt University is a Scottish charity registered under charity number SC000278.
Received on Friday, 14 November 2014 13:16:18 UTC