Re: [RSP CG] Interval streams? from Rinne Mikko on 2013-10-02 (public-rsp@w3.org from October 2013)

From: Rinne Mikko <mikko.rinne@aalto.fi>
Date: Wed, 2 Oct 2013 08:46:50 +0000
To: "Anicic, Darko" <darko.anicic@siemens.com>
CC: Oscar Corcho <ocorcho@fi.upm.es>, Abraham Bernstein <bernstein@ifi.uzh.ch>, "public-rsp@w3.org" <public-rsp@w3.org>
Message-ID: <51AE09ED-C0A5-4E2B-ABF3-7BA11E0EF14F@aalto.fi>
Hi Darko & al!

Some definitions start to emerge, and I don't remember seeing them all in the same place, so we should probably start to work on our own? My current compilation and list of issues would be:

Facts: Static knowledge, which can be expressed as RDF triples. Special case of situations / temporally valid facts: validity from -infinity to +infinity.

Situations / states / temporally valid facts: The difference between "situations", "states" and "temporally valid facts" is a bit fuzzy to me. Do we need many, and if so, what is the difference? Would it be that "situations" or "states" are inferred inside the stream processing platform, while "temporally valid facts" are something encoded in RDF? With only this distinction, my vote would go for having just one definition; the concept of a temporally valid fact could still be the same, independent of the form in which it exists.

Events, event objects, simple events, synthetic events, composite events, event producers, event consumers: For these I'm still ok with the definitions in the Event Processing Glossary v. 2.0<http://www.complexevents.com/2011/08/23/event-processing-glossary-version-2-0/>. I would be compelled to have a clean distinction between "events" as instantaneously detected occurrences and "situations" or "temporally valid facts" for everything with a duration.

In CEP, Situations are sometimes known as Complex Events, but normally Complex Events are derived from Events (and not throughout a reasoning process from Events and SK).

True, it is a mess. I just searched EPGv2 and the words "static" or "fact" do not occur in the whole document. Whereas for us I believe it is very important that static knowledge (e.g. Linked Data), temporally valid facts and reasoning can be a part of the process triggering higher-level abstractions. Looks like we either have to extend the definition of complex events or come up with a new term. Extending might be difficult, because the original concept is widely used and was defined by another body. In other words, a great opportunity for those of us (myself included :-) who think that the term "complex event processing" is both misleading and unnecessarily negative.

In addition to complex, "derived", "virtual", "synthesized", "synthetic" and "composite" events are already loaded by EPGv2. Summarizing events? Compiled events? Reasoned Events? Or just overloading "Derived Events"? Note that here I'm using "events" exclusively, because for triggered abstractions with a known duration "fact" or "state" or "situation" would be more appropriate.

Now going back to the current content from the wiki, IMHO the interval based and graph oriented proposal looks good in sense that:
- it is general (convenient to capture more than just a triple, and the same time it subsumes the triple oriented approach too)

There is a need to express a time interval with the scope of a graph, I agree. However, in terms of generality, one interval is missing the case of multiple timestamps or intervals (depending e.g. on who derived the time). It is also redundant for events, where only the instantaneous timestamp is known.

BR,

Mikko


On 1. Oct 2013, at 6:54 PM, Anicic, Darko wrote:

Hi everyone,

Thank you all for a very interesting points. I’d like to add to this conversation few design decisions taken when developing ETALIS/EP-SPARQL.

In ETALIS there is notion of Temporal Knowledge (TK) and Static or Slowly Evolving Knowledge (SK). TK consists of Events and Situations. SK consists of facts.
Events (atomic events) are instantaneous, i.e., defined in a single point of time. Situations are derived throughout a reasoning process based on events and facts. ETALIS processes events as they occur and may consult SK as domain knowledge (background knowledge). It evaluates the background knowledge on the fly, possibly inferring new implicit knowledge, known as Situations. Situations are defined on time intervals. In this way it is possible to do temporal reasoning (e.g., reasoning about 13 possible relations from Allen’s Interval Algebra and SK), and possible anomalies w.r.t semantics can be avoided. In CEP, Situations are sometimes known as Complex Events, but normally Complex Events are derived from Events (and not throughout a reasoning process from Events and SK). In general, for a Situation S[t1,t2] one can use timestamp data (t1 and t2) to manipulate with it, e.g., use t1 as soon as the detection of S has started to calculate something or to trigger something (before S has been detected, i.e., before t2 is known). The notion of sliding window is however orthogonal to the notion of interval on which a Situation is defined. E.g., from the example in the cafeteria, we can calculate the average number of people in the cafeteria within the last 10 minute interval. For every change, a new S will be triggered with its own pair of [t1,t2]. t2 - t1 is always smaller than 10 min but it is in general always different number, hence it is important that S carries this info (t1 and t2, and of course S does not carry the info about the size of the window, i.e., 10 min). Situations as other events are a part of a stream, hence in a more complex workflow they can be used to form more complex situations (again possibly derived from both TK and SK). SK consists of facts that are valid from the time they are inserted to a knowledge base until they are deleted. Therefore we say they don’t belong to Temporal Knowledge. Instead they are Static or Slowly Evolving Knowledge.

Now going back to the current content from the wiki, IMHO the interval based and graph oriented proposal looks good in sense that:
- it is general (convenient to capture more than just a triple, and the same time it subsumes the triple oriented approach too)
- it enables semantically correct computation (based on intervals, in general)
- it enables richer semantics (e.g., temporal reasoning over intervals)

Cheers,
Darko Anicic

Mit freundlichen Grüßen
Dr.Darko Anicic

Siemens AG
Corporate Technology
CT RTC NEC ITH-DE
Otto-Hahn-Ring 6
81739 München, Deutschland
Tel: +49 89 636-28283
Fax: +49  89  636-51115
mailto:darko.anicic@siemens.com

Siemens Aktiengesellschaft: Vorsitzender des Aufsichtsrats: Gerhard Cromme; Vorstand: Joe Kaeser, Vorsitzender; Roland Busch, Klaus Helmrich, Barbara Kux, Hermann Requardt, Siegfried Russwurm, Peter Y. Solmssen, Michael Süß, Ralf P. Thomas; Sitz der Gesellschaft: Berlin und München, Deutschland; Registergericht: Berlin Charlottenburg, HRB 12300, München, HRB 6684; WEEE-Reg.-Nr. DE 23691322


Von: Oscar Corcho [mailto:ocorcho@fi.upm.es]
Gesendet: Donnerstag, 26. September 2013 10:12
An: Abraham Bernstein; Rinne Mikko
Cc: public-rsp@w3.org<mailto:public-rsp@w3.org>
Betreff: Re: [RSP CG] Interval streams?

I think that Avi makes a very interesting point here, which we should not forget when we are considering whether something is an event or a fact.

Stream processors do not need to be just isolated systems that process input streams and (possibly) produce output streams. There are more general cases where such processors will be organized in chains/workflows, and what we consider a stream of facts generated as the output of one processor can be considered as a stream of events that is consumed as the input by another one. Hence there is room for considering that stream processors will be also generating streams of facts.

I am thinking now of a couple of examples that could illustrate this and that we may want to write down as well as part of the documentation that we are generating.

Oscar


--

Oscar Corcho
Ontology Engineering Group (OEG)
Departamento de Inteligencia Artificial
Facultad de Informática
Campus de Montegancedo s/n
Boadilla del Monte-28660 Madrid, España
Tel. (+34) 91 336 66 05
Fax  (+34) 91 352 48 19

De: Abraham Bernstein <bernstein@ifi.uzh.ch<mailto:bernstein@ifi.uzh.ch>>
Fecha: jueves, 26 de septiembre de 2013 09:06
Para: Rinne Mikko <mikko.rinne@aalto.fi<mailto:mikko.rinne@aalto.fi>>
CC: RDF Stream Processing <public-rsp@w3.org<mailto:public-rsp@w3.org>>
Asunto: Re: [RSP CG] Interval streams?
Nuevo envío de: RDF Stream Processing <public-rsp@w3.org<mailto:public-rsp@w3.org>>
Fecha de nuevo envío: Thu, 26 Sep 2013 07:06:42 +0000

Hi Rinne, all

I think that there are many scenarios in which one would want to stream valid facts. Tom here at UZH can give a number of examples but let me just offer one possible case.

Assume that you have a set of sensory observations (eg the number of people entering and leaving the cafeteria) and your streaming system may want to compute the average number of people in the cafeteria for every 10 minute interval. Obviously, this task could be easily accomplished without intervals at the input side but the output side is actually a temporally valid fact. So far I am in total agreement with your example.

It is easy imaginable, that you would want that output streamed to a second component of a system that takes decisions based in the input. For example, if the number of people in the cafeteria goes above a certain number then the entrance must be closed (sorry for the somewhat simplistic example, but I am just sitting in a university cafeteria...).  Obviously, this system processes as an input an interval delimited fact.

In real life many measurements are really interval-based facts. the speed of a car is almost always an interval-based fact (I will leave the philosophical question, if it speed is actually an interval based fact or only the way we measure it is aside), any measure of throughput, etc. As such, almost all our use cases have interval-based facts. Given that you can never assume that you will not consume any downstream facts (as even most sensors actually already deliver them) I believe that a stream processing system will have to be able to process such facts.

I look forward to you opinions.

Cheers

Avi


On 26.09.2013, at 08:21, Rinne Mikko <mikko.rinne@aalto.fi<mailto:mikko.rinne@aalto.fi>> wrote:



Hi All!

Thank you very much for the good discussion yesterday! Since Jean-Paul invited discussion on the email reflector, I'll give it a try. To me this is also related to the joint action point of timestamps and their relation to use cases.

In the wiki<http://www.w3.org/community/rsp/wiki/RDF_Stream_Models> under RDF Stream Models / Temporal Graphs we are listing "variant 2 - interval based" graphs. These can be described as temporally valid facts, where validity is indicated by the specified time interval. It is clear to me that such temporally valid facts exist and that there must be a method to archive and query them.

However, what I cannot easily motivate, is when would we want to stream temporally valid facts?

The problem is that a fact can be added to a stream only when the ending time is known. In many cases this will be the time, when the fact is no longer valid.

Taking the example on Daniele's slide<http://www.dellaglio.org/uploads/rsp-phone-call-0925.pdf> 8 (second last), if we create a stream of meeting durations, we can only stream the information on each meeting *after* the meeting has finished. We can do historical queries on how long meetings lasted, or which meetings had conflicts, but we cannot query which meetings are in progress *now* or generate an alert about a resource conflict when it happens. Which, to me, is pretty much the essence of why we process real-time streams instead of archived datasets.

It is certainly valid that in many cases the duration of a temporally valid fact can be anticipated at the time when it is initiated. But if the duration can be anticipated, then there is quite often also a rule for the duration. In many such cases we could probably save such rules either in a dataset or as event processing rules on our event processing platform and add them on-the-fly without explicitly streaming the duration with each fact.

To me the more flexible and "real-time" solution to the said example would be to generate a stream of events signalling the starting and ending points of meetings:
:alice :meets :bob @t1
:alice :meets :carl @t3
:alice :finishesmeetingwith :bob @t5
:bob :meets :diana @t6
:alice :finishesmeetingwith :carl @t7
...

The status (which meetings are in progress now?) would be maintained on the event processing platform and the historical data (a graph of meeting starting and ending times) could be saved to a dataset for later querying.

Of course this might all be just my lack of imagination, but it would be helpful for me, if we could get a use case motivating the streaming of temporally valid facts. Anyone?

BR,

Mikko




-----------------------------------------------------------------
|  Professor Abraham Bernstein, PhD
|  University of Zürich, Department of Informatics
|  web: http://www.ifi.uzh.ch/ddis/bernstein.html
Received on Wednesday, 2 October 2013 08:47:24 UTC