Re: [RSP CG] Interval streams?

I think that Avi makes a very interesting point here, which we should not
forget when we are considering whether something is an event or a fact.

Stream processors do not need to be just isolated systems that process input
streams and (possibly) produce output streams. There are more general cases
where such processors will be organized in chains/workflows, and what we
consider a stream of facts generated as the output of one processor can be
considered as a stream of events that is consumed as the input by another
one. Hence there is room for considering that stream processors will be also
generating streams of facts.

I am thinking now of a couple of examples that could illustrate this and
that we may want to write down as well as part of the documentation that we
are generating.

Oscar


-- 

Oscar Corcho
Ontology Engineering Group (OEG)
Departamento de Inteligencia Artificial
Facultad de Informática
Campus de Montegancedo s/n
Boadilla del Monte-28660 Madrid, España
Tel. (+34) 91 336 66 05
Fax  (+34) 91 352 48 19

De:  Abraham Bernstein <bernstein@ifi.uzh.ch>
Fecha:  jueves, 26 de septiembre de 2013 09:06
Para:  Rinne Mikko <mikko.rinne@aalto.fi>
CC:  RDF Stream Processing <public-rsp@w3.org>
Asunto:  Re: [RSP CG] Interval streams?
Nuevo envío de:  RDF Stream Processing <public-rsp@w3.org>
Fecha de nuevo envío:  Thu, 26 Sep 2013 07:06:42 +0000

Hi Rinne, all

I think that there are many scenarios in which one would want to stream
valid facts. Tom here at UZH can give a number of examples but let me just
offer one possible case.

Assume that you have a set of sensory observations (eg the number of people
entering and leaving the cafeteria) and your streaming system may want to
compute the average number of people in the cafeteria for every 10 minute
interval. Obviously, this task could be easily accomplished without
intervals at the input side but the output side is actually a temporally
valid fact. So far I am in total agreement with your example.

It is easy imaginable, that you would want that output streamed to a second
component of a system that takes decisions based in the input. For example,
if the number of people in the cafeteria goes above a certain number then
the entrance must be closed (sorry for the somewhat simplistic example, but
I am just sitting in a university cafeteria...).  Obviously, this system
processes as an input an interval delimited fact.

In real life many measurements are really interval-based facts. the speed of
a car is almost always an interval-based fact (I will leave the
philosophical question, if it speed is actually an interval based fact or
only the way we measure it is aside), any measure of throughput, etc. As
such, almost all our use cases have interval-based facts. Given that you can
never assume that you will not consume any downstream facts (as even most
sensors actually already deliver them) I believe that a stream processing
system will have to be able to process such facts.

I look forward to you opinions.

Cheers

Avi


On 26.09.2013, at 08:21, Rinne Mikko <mikko.rinne@aalto.fi> wrote:

> 
> Hi All!
> 
> Thank you very much for the good discussion yesterday! Since Jean-Paul invited
> discussion on the email reflector, I'll give it a try. To me this is also
> related to the joint action point of timestamps and their relation to use
> cases.
> 
> In the wiki <http://www.w3.org/community/rsp/wiki/RDF_Stream_Models>  under
> RDF Stream Models / Temporal Graphs we are listing "variant 2 - interval
> based" graphs. These can be described as temporally valid facts, where
> validity is indicated by the specified time interval. It is clear to me that
> such temporally valid facts exist and that there must be a method to archive
> and query them.
> 
> However, what I cannot easily motivate, is when would we want to stream
> temporally valid facts?
> 
> The problem is that a fact can be added to a stream only when the ending time
> is known. In many cases this will be the time, when the fact is no longer
> valid.
> 
> Taking the example on Daniele's slide
> <http://www.dellaglio.org/uploads/rsp-phone-call-0925.pdf>  8 (second last),
> if we create a stream of meeting durations, we can only stream the information
> on each meeting *after* the meeting has finished. We can do historical queries
> on how long meetings lasted, or which meetings had conflicts, but we cannot
> query which meetings are in progress *now* or generate an alert about a
> resource conflict when it happens. Which, to me, is pretty much the essence of
> why we process real-time streams instead of archived datasets.
> 
> It is certainly valid that in many cases the duration of a temporally valid
> fact can be anticipated at the time when it is initiated. But if the duration
> can be anticipated, then there is quite often also a rule for the duration. In
> many such cases we could probably save such rules either in a dataset or as
> event processing rules on our event processing platform and add them
> on-the-fly without explicitly streaming the duration with each fact.
> 
> To me the more flexible and "real-time" solution to the said example would be
> to generate a stream of events signalling the starting and ending points of
> meetings:
> :alice :meets :bob @t1
> :alice :meets :carl @t3
> :alice :finishesmeetingwith :bob @t5
> :bob :meets :diana @t6
> :alice :finishesmeetingwith :carl @t7
> ...
> 
> The status (which meetings are in progress now?) would be maintained on the
> event processing platform and the historical data (a graph of meeting starting
> and ending times) could be saved to a dataset for later querying.
> 
> Of course this might all be just my lack of imagination, but it would be
> helpful for me, if we could get a use case motivating the streaming of
> temporally valid facts. Anyone?
> 
> BR,
> 
> Mikko
> 
> 
> 

-----------------------------------------------------------------
|  Professor Abraham Bernstein, PhD
|  University of Zürich, Department of Informatics
|  web: http://www.ifi.uzh.ch/ddis/bernstein.html

Received on Thursday, 26 September 2013 08:12:08 UTC