Re: [RSP CG] Interval streams?

Hi Avi!

Thank you for the quick feedback! The example of periodic averages does nicely exemplify a case, where a result is tied to a historical interval. I agree it is a valid case for streaming information with intervals.

I do, however, have some other observations on the mentioned cases:

1) If the average is computed every 10 minutes over 10 minutes, it is actually pretty redundant to send that interval with every measurement. For this case it would be nice to have a way of stating that for the entire stream without sending it with every sample. (However, here I already have my own counter-example: We may want to calculate the averages more densely, when approaching a limit => flexible window-size => easier to send with each sample.)

2) Closing the gate due to overloading based on past averages is actually exactly the thing we shouldn't do. Depending on the rate of increase we might be well past the actual limit before we observe that the average went over. For this case I would definitely observe directly the people entering and exiting and toggle gate open/closed based on the instantaneous number of people inside.

Anyway, I agree that there are cases where we want to indicate time intervals in a stream. If not as durations of facts initiated by events, then as historical observation periods used to derive data.

Point taken!

Mikko





On 26. Sep 2013, at 10:06 AM, Abraham Bernstein wrote:

Hi Rinne, all

I think that there are many scenarios in which one would want to stream valid facts. Tom here at UZH can give a number of examples but let me just offer one possible case.

Assume that you have a set of sensory observations (eg the number of people entering and leaving the cafeteria) and your streaming system may want to compute the average number of people in the cafeteria for every 10 minute interval. Obviously, this task could be easily accomplished without intervals at the input side but the output side is actually a temporally valid fact. So far I am in total agreement with your example.

It is easy imaginable, that you would want that output streamed to a second component of a system that takes decisions based in the input. For example, if the number of people in the cafeteria goes above a certain number then the entrance must be closed (sorry for the somewhat simplistic example, but I am just sitting in a university cafeteria...).  Obviously, this system processes as an input an interval delimited fact.

In real life many measurements are really interval-based facts. the speed of a car is almost always an interval-based fact (I will leave the philosophical question, if it speed is actually an interval based fact or only the way we measure it is aside), any measure of throughput, etc. As such, almost all our use cases have interval-based facts. Given that you can never assume that you will not consume any downstream facts (as even most sensors actually already deliver them) I believe that a stream processing system will have to be able to process such facts.

I look forward to you opinions.

Cheers

Avi


On 26.09.2013, at 08:21, Rinne Mikko <mikko.rinne@aalto.fi<mailto:mikko.rinne@aalto.fi>> wrote:


Hi All!

Thank you very much for the good discussion yesterday! Since Jean-Paul invited discussion on the email reflector, I'll give it a try. To me this is also related to the joint action point of timestamps and their relation to use cases.

In the wiki<http://www.w3.org/community/rsp/wiki/RDF_Stream_Models> under RDF Stream Models / Temporal Graphs we are listing "variant 2 - interval based" graphs. These can be described as temporally valid facts, where validity is indicated by the specified time interval. It is clear to me that such temporally valid facts exist and that there must be a method to archive and query them.

However, what I cannot easily motivate, is when would we want to stream temporally valid facts?

The problem is that a fact can be added to a stream only when the ending time is known. In many cases this will be the time, when the fact is no longer valid.

Taking the example on Daniele's slide<http://www.dellaglio.org/uploads/rsp-phone-call-0925.pdf> 8 (second last), if we create a stream of meeting durations, we can only stream the information on each meeting *after* the meeting has finished. We can do historical queries on how long meetings lasted, or which meetings had conflicts, but we cannot query which meetings are in progress *now* or generate an alert about a resource conflict when it happens. Which, to me, is pretty much the essence of why we process real-time streams instead of archived datasets.

It is certainly valid that in many cases the duration of a temporally valid fact can be anticipated at the time when it is initiated. But if the duration can be anticipated, then there is quite often also a rule for the duration. In many such cases we could probably save such rules either in a dataset or as event processing rules on our event processing platform and add them on-the-fly without explicitly streaming the duration with each fact.

To me the more flexible and "real-time" solution to the said example would be to generate a stream of events signalling the starting and ending points of meetings:
:alice :meets :bob @t1
:alice :meets :carl @t3
:alice :finishesmeetingwith :bob @t5
:bob :meets :diana @t6
:alice :finishesmeetingwith :carl @t7
...

The status (which meetings are in progress now?) would be maintained on the event processing platform and the historical data (a graph of meeting starting and ending times) could be saved to a dataset for later querying.

Of course this might all be just my lack of imagination, but it would be helpful for me, if we could get a use case motivating the streaming of temporally valid facts. Anyone?

BR,

Mikko




-----------------------------------------------------------------
|  Professor Abraham Bernstein, PhD
|  University of Zürich, Department of Informatics
|  web: http://www.ifi.uzh.ch/ddis/bernstein.html

Received on Thursday, 26 September 2013 08:05:11 UTC