Re: Streams in an unreliable world.

Hi Andy & al!

Excellents points! I would look at this from a couple of angles:

1) Imperfections known at the stream producer

These could be e.g. due to aggregating inputs from multiple sensors. I did some initial drafting on a stream description vocabulary and example datasets earlier this week. Testing different ideas, I also looked at some quality-related information, which could be stream-specific and known a priori:

a) Ordering of events or facts in the stream
- strict / approximate / unordered
- if not strictly ordered, possibility to define some "maximum out-of-order delay"?

b) Sequence information fields (timestamps, counters etc.)
- ordering reliability estimate
- sequence information field x present: sometimes / usually / always (e.g. it is known that some sensors include it while some don't)

I think next telco was dedicated to querying, but perhaps we could resume the stream discussion in Sydney?

2) Robustness of the stream against errors in transmission

The stream can be made more robust against transmission errors; e.g. by transmitting multiple copies of the same events / facts in cases where losses are more critical than ordering. If this is needed, it could also be a part of the stream description to help the receiver to understand what is going on? 

3) Imperfections created during transmission

This information cannot be explicitly included into the stream or the events in it, as it is not known at stream generation time. The protocols to transmit RDF streams were mentioned in the first telco, but we haven't discussed them after that. Are transmission protocols expected to impact how streams are generated other than generic imperfections (loss, jitter, out-of-order)? If yes, they should definitely be in scope and discussed together with stream construction. If not, I don't have a strong opinion on scope, but would prefer to make progress with stream formats first and then move onto transmission.

4) Fault tolerance in the consuming platform (= receiver)

This I wouldn't immediately count into group scope. If the group can give the tools, everyone can build their fault tolerance into the platforms and / or the queries.

5) Producer and consumer variations

Stream descriptions can help both in producer and consumer variations:
- can be referenced by catalogues, which can help to manage changes in producers
- can contain all format and prefix information to help consumers to join streams at any time.


Just some initial thoughts on this.

Cheers,

Mikko


On 5. Oct 2013, at 7:48 PM, Andy Seaborne wrote:

> On the web, and indeed in any system of more than modest size and in one management domain, issues of
> 
> * out of order delivery, including arbitrarily late arrival
> * new stream producers coming online, old stream producers ending
>     (discovery, joining, leaving)
> * consumers joining and leaving
> * streams becoming unavailable
> * ... then restarting (with or without loss of potential events)
> 
> leading to design points on
> 
> + choice of timestamps
> + delivery ordering semantics
> + delivery guarantees (at least once, exactly once, at most once)
> + persistence, and for how long
>   (forwards, for guaranteed delivery and backwards for consumers to
>    catch up).
> 
> What are your thoughts on these issues?  In-scope or out-of-scope of the CG? Necessary or optimal to consider?
> 
> 	Andy
> 
> 

Received on Saturday, 5 October 2013 19:41:21 UTC