Re: RSP next calls from Oscar Corcho on 2013-11-22 (public-rsp@w3.org from November 2013)

From: Oscar Corcho <ocorcho@fi.upm.es>
Date: Fri, 22 Nov 2013 10:13:06 +0100
To: Rinne Mikko <mikko.rinne@aalto.fi>, "public-rsp@w3.org" <public-rsp@w3.org>
Message-ID: <CEB4DE8C.8926E%ocorcho@fi.upm.es>
Hi Mikko,

Thanks for this very detailed e-mail. I think that this gives us a very good
outline for today's discussion.

Oscar

-- 

Oscar Corcho
Ontology Engineering Group (OEG)
Departamento de Inteligencia Artificial
Facultad de Informática
Campus de Montegancedo s/n
Boadilla del Monte-28660 Madrid, España
Tel. (+34) 91 336 66 05
Fax  (+34) 91 352 48 19

De:  Rinne Mikko <mikko.rinne@aalto.fi>
Fecha:  viernes, 22 de noviembre de 2013 09:00
Para:  RDF Stream Processing <public-rsp@w3.org>
Asunto:  Re: RSP next calls
Nuevo envío de:  RDF Stream Processing <public-rsp@w3.org>
Fecha de nuevo envío:  Fri, 22 Nov 2013 08:01:10 +0000


Hi Emanuele & al!

In anticipation of today's meeting I tried to go through the points
currently under discussion. Based on Emanuele's and my comments there should
be a couple of fronts where we can still make some further progress even
before the meeting. I'll try to list those points here and preserve also the
current status in the Wiki so that there are no surprises without
possibility of reverting, if something was overlooked.

1. Scope of the community group

I would be happy to say that we first concentrate on streaming of
time-varying objects, either annotated by a minimum of one timestamp or
interval, or having an implicit timestamp based on reception of the stream.
I have reservations with the current proposal, because:

a) We have no definition of "time-series". The problem with the one given in
email ("a non decreasing timestamp") is that it effectively hides all
problematics of a distributed system, where streaming objects may both have
timestamps assigned by non-synchronized clocks and the objects may arrive to
the stream processor out-of-order. Restricting to non-decreasing timestamps
would effectively limit us to cases, where timestamps are assigned by a
singular place per stream (a stream processing agent). Ability to properly
support distributed, heterogeneous environments is a big motivator for using
RDF in the first place. Therefore I fear that by restricting ourselves to
non-decreasing timestamps we lose more than may first appear.

I'm still uncertain about whether to include implicit timestamps to the
scope. On one hand it seems overkill to require every low-level streaming
object to have an explicit timestamp, especially if the next stream
processing agent on the path is going to assign it a new one anyway, but on
the other hand it is hard to imagine a device capable of producing RDF
without any kind of clock circuit. Perhaps it is not so much the capability
of the devices, but rather the extra trouble of trying to keep the clocks of
e.g. 5,000 sensors in sync, if the transmission network to the stream
aggregator has very low delay variation. As a result I still believe that
implicit timestamps should be in the scope, but it would be fairly easy
convince me otherwise.

b) "Infinite" rules out recorded streams. Since recorded streams are the
most verifiable part of this work, it seems strange to proceed without them.
Also, I'm not sure how the inclusion of recorded streams would slow down the
work, because a segment of a live stream can always be recorded and played
back. The biggest difference is that a recorded stream requires explicit
knowledge of timestamps, but if our live stream work acknowledges the
presence of explicit timestamps, the "feature" is already included by
default.

2. Ordered / not ordered

We had some discussion with Alasdair on the wiki, but there's not a proposal
on the table (afaik) for what to write into the characterization of a
recorded stream. For me the issue of ordering itself is relative: Any stream
processing agent (including stream producers) can insert a counter, which
can be used to restore order observed at that point in time. Whether that is
the "correct" order, is a much more philosophical question. In theory the
"correct" order of observations is the one in which the observations were
made, but due to system inaccuracies we may not have the data to restore
that order. Is our stream ordered if the timestamps are properly increasing,
but the sound of lightning is observed before the flash?

Without other proposals on the table my suggestion would be to remove
"ordered / not ordered" from the definition of recorded streams (it doesn't
differentiate recorded from live - live can also arrive out-of-order), and
instead write something about the capability of the stream to support order
restoration once we get to requirements.

3. Streams and stream elements lacking reference to RDF

Emanuele was commenting on this. I can, of course, do the following
replacements:

Streams => RDF Streams

(Live) Data Stream: An unbounded sequence of time-varying data elements. =>
(Live) RDF Data Stream: An unbounded sequence of time-varying data elements
encoded in RDF.

Recorded Data stream: A stream saved e.g. to a computer file. => Recorded
RDF Data stream: An RDF stream saved e.g. to a computer file.

Elements in A Stream => Elements in An RDF Stream

...etc. etc. But is there something more substantial that we need to say?
Does it add value to insert RDF everywhere, or is it enough that the group
scope is RDF streams?

4. Streaming and background information, hybrid objects

First-off, this morning I finally managed to answer "yes" to my old question
in the wiki on whether we are going to have hybrid objects, i.e. objects
that are both state objects and event objects at the same time. It is
actually rather easy to come up with a scenario, where two or more stream
producers will send state objects (temporally valid data), which will get
single timestamps from an aggregator stream processing agent when merging
the streams. And perhaps another set of timestamps assigned by a stream
consumer upon reception. At this point every streaming object will contain
both single timestamps and intervals and whether a streaming object is seen
as a state object or event object no longer depends on the object itself,
but rather on what a stream processing agent wants to do with it.

This is what I was looking for to break my earlier proposal, which was to
have state objects and event objects as special cases of streaming objects.
I'll try to merge the current state object and event object definitions
somehow under the streaming object definition. As this is a bit bigger
repair, I'll do it by deprecating the old "Elements in A Stream" section,
copying into a new version of the section and editing that. We will then
have the option of reverting to the old one, if the meeting thinks it was a
bad idea or didn't come out right. I hope this change at least aligns with
half of the comment from Emanuele requesting to only distinguish data as
streaming and background.

On the other half of Emanuele's comment, "background information", I
unfortunately somewhat disagree. To me "background information" stands for
static datasets, which are typically retrieved all at once and can be
processed with "normal" SPARQL semantics. I have no problem adding a
definition for "background information" (we are contribution-driven! :-),
but I wouldn't think of that as "an element in a stream". "Background" to me
refers only to the data, not the method of delivery. The streaming of
background data is possible, of course. Also, I would still keep the "static
object" at least to:
a) indicate that we understand the difference
b) indicate that static objects can theoretically also be sent as a stream
c) be able to say what we don't do in the first phase.
As to problems with the word "static" I'm happy to discuss other proposals.
My requirement would just be to keep it compact and understandable. To me
"static" is fine with the interpretation "true until stated otherwise",
which also differentiates it nicely from facts with temporal or
instantaneous validity. "static or slowly changing" is too long.
"semi-static" is ok, but in my opinion doesn't really add value in this
case.

5. Time as annotation vs. time as data

The intention with defining the streaming objects was to encapsulate both.
I'll try to include the annotation aspect of time when updating the
streaming object definition. At the same time I'm trying to avoid building
assumptions of the solution into the definitions, because these definitions
are only there to help us write requirements and should not prematurely fix
solutions.

--------------------------

Those were the things I had in mind, next I'll try to work a bit on the
Wiki. One observation is that we haven't had many proposals for new
definitions since the initial set. It's nice if we can manage with these,
but more likely we will just have to revisit this document once we start
work on the requirements and see what we really need.

BR,

Mikko

On 14. Nov 2013, at 4:41 PM, Emanuele Della Valle wrote:

> Hello Jean-Paul, Eva, Avi and all,
> 
> what Eva says and Jean-Paul confirm is also what I think, sorry to create
> confusions. 
> 
> My answer to Avi was putting emphasis on the fact that by "time-series" I do
> not mean a *sequence of numbers ordered by recency*, but a *sequence of RDF
> triples ordered by recency*.
> 
> Concerning how to describe time from the application perspective, my position
> is the following one:
> - 0 timestamps (i.e., relying on the temporal distance between the received
> triples ) makes compatibility with RDF straight forward, but it may hide
> problems (e.g., the temporal distance between two triples may be influenced by
> network delays)
> - 1 (point in time semantics for application time) allows for handling out of
> orders and for basic temporal operators (e.g., follows, precedes,
> contemporaryWith)
> - 2 (interval based semantics for events) allows for expressive temporal
> operators, but, at least in many scenarios I target, it is an overkill
> 
> Most of the commercial DSMS/CEP take either 0 or 1. The only commercial CEP
> that I know supporting 2 is Microsoft StreamInsight.
> 
> Time from the system perspective is a different issue. Whether system time
> should be externalised is something I still wonder.
> 
> Cheers,
> 
> Emanuele
> 
> 
> On Nov 14, 2013, at 1:58 AM, Jean-Paul <jp.calbimonte@upm.es>
>  wrote:
> 
>> Hello all,
>> 
>> Yes, I think Oscar's diagram (check it here:
>> http://www.w3.org/community/rsp/wiki/Meeting_22.10.2013) more or less
>> reflects part of the discussion we had about the scope.
>> 
>> We seem to agree that ordered streams of elements (infinite or 'recorded'
>> streams as well) are in scope (green ticks in the diagram). In these cases
>> the order might be of different natures but we agreed to focus on time-based
>> order. I don't think we agreed yet on focusing only in point in time
>> timestamps or intervals. For the moment it is just time-based order, I
>> believe. 
>> 
>> Then there are datasets which may not be streaming in nature but might be
>> needed to processed in a streaming fashion (e.g. a very large dataset). I
>> understood we are not ruling this case out, but might not focus on it in a
>> first stage.
>> 
>> Thanks to Emanuele for the input about the scope. As Eva pointed out, there
>> are some discrepancies that we can fix in the wiki. I am also a bit
>> unconfortable with calling the streams in our scope as 'time-series', I think
>> this term has other connotations in related areas.
>> 
>> well, this is just a personal comment as well, but I'm happy to continue this
>> discussion. We can also continue modifying the wiki until we have the Telco,
>> and afterwards.
>> 
>> best
>> jp
>> 
>> 
>> 
>> 
>> 
>> 
>> 
>> 
>> 2013/11/13 Eva Blomqvist <evabl444@gmail.com>
>>> Hi!
>>> I think that some of those who were in the meeting also might have slightly
>>> differing interpretations of what was said... I agree to that there were two
>>> alternative interpretations of "data stream" discussed, but as far as I
>>> understood those differed in the sense that 1) was an *infinite* stream,
>>> where the elements of the stream could somehow be *associated with time*
>>> (whether a timestamped triple, a timestamped graph, or just a stream where
>>> time is implicit from the arrival times of elements etc), and 2) was a
>>> *finite* stream of elements where *time is not necessarily an aspect*, e.g.
>>> triples from a data store being processed in a streaming fashion.
>>> 
>>> I would be reluctant to at this stage limit ourselves to a specific model,
>>> e.g. RDF statements with a single timestamp each.
>>> Just my 2c..
>>> /Eva
>>> 
>>> 
>>> On 12/11/2013 17:33 , Emanuele Della Valle wrote:
>>>> Hi Abram, 
>>>> 
>>>> I mean a list of tuples <s,t> where s is an RDF statement and t is a non
>>>> decreasing timestamp.
>>>> 
>>>> Cheers,
>>>> 
>>>> Emanuele
>>>> 
>>>> --
>>>> prof. Emanuele Della Valle
>>>> DEIB - Politecnico di Milano
>>>> m. +393389375810 <tel:%2B393389375810>
>>>> w. http://emanueledellavalle.org <http://emanueledellavalle.org/>
>>>> 
>>>> On Nov 12, 2013, at 12:27 PM, Abraham Bernstein <bernstein@ifi.uzh.ch>
>>>>  wrote:
>>>> 
>>>>> Emanuele, all
>>>>> 
>>>>> I am slightly confused.... so just to clarify When you talk about
>>>>> time-series: do you mean a series of numbers (expressed in triples) or a
>>>>> time-ordered series of triples?
>>>>> 
>>>>> Cheers
>>>>> 
>>>>> Avi
>>>>> 
>>>>> 
>>>>> On 12.11.2013, at 03:05, Jean-Paul <jp.calbimonte@upm.es> wrote:
>>>>> 
>>>>>> Hello,
>>>>>> 
>>>>>> Thanks for your input. 4th Telco will be on nov 22 15:00 CET.
>>>>>> We will be discussing about the Streams concepts and definitions that we
>>>>>> have started drafting in the wiki.
>>>>>> Please feel free to provide your input there already:
>>>>>> 
>>>>>> http://www.w3.org/community/rsp/wiki/Concepts_and_Definitions
>>>>>> 
>>>>>> ...specialy if there is a key concept missing that you consider we should
>>>>>> include.
>>>>>> 
>>>>>> Cheers,
>>>>>> jp
>>>>>> 
>>>>>> 
>>>>>> PS 
>>>>>> Please, if Danh or Manfred can help us again with Webex, we will be very
>>>>>> thankful.
>>>>>> 
>>>>>> 
>>>>>> 
>>>>>> 
>>>>>> 
>>>>>> 
>>>>>> 2013/11/6 Jean-Paul <jp.calbimonte@upm.es>
>>>>>>> Yes, I see. That will make everyone's life easier.
>>>>>>> We'll dicuss it.
>>>>>>> 
>>>>>>> thanks
>>>>>>> 
>>>>>>> jp
>>>>>>> 
>>>>>>> 
>>>>>>> 2013/11/6 Axel Polleres <axel@polleres.net>
>>>>>>> Thanks, BTW, may I suggest that instead of a single doodle per Telco, to
>>>>>>> doodle for one fixed timeslot per week, e.g. "Tue 15:00" or alike, as
>>>>>>> usual in other WGs? I think this should make planning easier. Maybe we
>>>>>>> can discuss this in the Telco.
>>>>>>> 
>>>>>>> thanks & best regards,
>>>>>>> Axel
>>>>>>> 
>>>>>>> --
>>>>>>> Prof. Dr. Axel Polleres
>>>>>>> Institute for Information Business, WU Vienna
>>>>>>> url: http://www.polleres.net/  twitter: @AxelPolleres
>>>>>>> 
>>>>>>> On Nov 2, 2013, at 11:40 PM, Jean-Paul <jp.calbimonte@upm.es> wrote:
>>>>>>> 
>>>>>>>> > Hello All,
>>>>>>>> >
>>>>>>>> > Thanks to all who could attend the meeting at ISWC, and specially to
>>>>>>>> those who made it through WebEx (although couldn't interact too much,
>>>>>>>> unfortunately)
>>>>>>>> >
>>>>>>>> > The meeting went quite well, and we received input from people of
>>>>>>>> other sub-communities and with different background. Others showed
>>>>>>>> interest, at least as 'observers' of what we are trying to do.
>>>>>>>> >
>>>>>>>> > One result of the meting is the intention of clarifying the scope of
>>>>>>>> our work. A first step to do this is to have written some of the key
>>>>>>>> concepts and definitions that we should agree on. Mikko has already
>>>>>>>> provided a first version as he already commented, and the purpose of
>>>>>>>> the next telecon will be to discuss them:
>>>>>>>> >
>>>>>>>> > http://www.w3.org/community/rsp/wiki/Concepts_and_Definitions
>>>>>>>> >
>>>>>>>> > Until then, I invite you all to contribute to that ( I see some have
>>>>>>>> already started, great!) so that we can have material for discussion.
>>>>>>>> >
>>>>>>>> > Please, also indicate your preferences for the next calls:
>>>>>>>> >
>>>>>>>> > http://doodle.com/a8ggni2v4su7c88b
>>>>>>>> >
>>>>>>>> > http://doodle.com/6i97qvmaqiwnwvsa
>>>>>>>> >
>>>>>>>> > http://doodle.com/hixgfbv9drxbu4in
>>>>>>>> >
>>>>>>>> >
>>>>>>>> > Thanks to all,
>>>>>>>> >
>>>>>>>> > jp
>>>>>>>> >
>>>>>>>> > --
>>>>>>>> > Jean-Paul Calbimonte
>>>>>>>> > Ontology Engineering Group
>>>>>>>> > Universidad Politécnica de Madrid
>>>>>>> 
>>>>>>> 
>>>>>>> 
>>>>>>> 
>>>>>>> -- 
>>>>>>> Jean-Paul Calbimonte
>>>>>>> Ontology Engineering Group
>>>>>>> Universidad Politécnica de Madrid
>>>>>> 
>>>>>> 
>>>>>> 
>>>>>> -- 
>>>>>> Jean-Paul Calbimonte
>>>>>> Ontology Engineering Group
>>>>>> Universidad Politécnica de Madrid
>>>>> 
>>>>> -----------------------------------------------------------------
>>>>> |  Professor Abraham Bernstein, PhD
>>>>> |  University of Zürich, Department of Informatics
>>>>> |  web: http://www.ifi.uzh.ch/ddis/bernstein.html
>>>>> 
>>>> 
>>> 
>> 
>> 
>> 
>> -- 
>> Jean-Paul Calbimonte
>> Ontology Engineering Group
>> Universidad Politécnica de Madrid
>
Received on Friday, 22 November 2013 09:30:19 UTC