Re: Streams in an unreliable world. from Andy Seaborne on 2013-10-06 (public-rsp@w3.org from October 2013)

From: Andy Seaborne <andy@apache.org>
Date: Sun, 06 Oct 2013 18:00:04 +0100
To: Rinne Mikko <mikko.rinne@aalto.fi>, "public-rsp@w3.org" <public-rsp@w3.org>
Message-ID: <52519714.5080503@apache.org>
Hi Mikko,

> If the group can give the tools, everyone can build their fault
 > tolerance into the platforms and / or the queries.

There is a lot behind that statement!  I picked that out as a 
representtaive and wanted to ask you if there is behind your comments a 
sense that the work of the CG is defining the processing model at a 
single place?

That's not a trivial scope but I don't think that done in isolation 
assuming that the characteristics of the web, unreliability in many 
forms, are taken care of elswhere will lead to take-up.  These 
characteristics do show through and affect the processing model (e.g. 
[1] but also the stream systems it references).

In deploying a real system, much of the effort is going to be in dealing 
with the imperfections arising from scale.  Different applications will 
want different tradeoffs (timeliness vs in-order delivery for example; 
or synchronization bwteen two streams).

 Andy

[1] http://research.google.com/pubs/pub41378.html

On 05/10/13 20:40, Rinne Mikko wrote:
>
> Hi Andy & al!
>
> Excellents points! I would look at this from a couple of angles:
>
> 1) Imperfections known at the stream producer
>
> These could be e.g. due to aggregating inputs from multiple sensors. I did some initial drafting on a stream description vocabulary and example datasets earlier this week. Testing different ideas, I also looked at some quality-related information, which could be stream-specific and known a priori:
>
> a) Ordering of events or facts in the stream
> - strict / approximate / unordered
> - if not strictly ordered, possibility to define some "maximum out-of-order delay"?
>
> b) Sequence information fields (timestamps, counters etc.)
> - ordering reliability estimate
> - sequence information field x present: sometimes / usually / always (e.g. it is known that some sensors include it while some don't)
>
> I think next telco was dedicated to querying, but perhaps we could resume the stream discussion in Sydney?
>
> 2) Robustness of the stream against errors in transmission
>
> The stream can be made more robust against transmission errors; e.g. by transmitting multiple copies of the same events / facts in cases where losses are more critical than ordering. If this is needed, it could also be a part of the stream description to help the receiver to understand what is going on?
>
> 3) Imperfections created during transmission
>
> This information cannot be explicitly included into the stream or the events in it, as it is not known at stream generation time. The protocols to transmit RDF streams were mentioned in the first telco, but we haven't discussed them after that. Are transmission protocols expected to impact how streams are generated other than generic imperfections (loss, jitter, out-of-order)? If yes, they should definitely be in scope and discussed together with stream construction. If not, I don't have a strong opinion on scope, but would prefer to make progress with stream formats first and then move onto transmission.
>
> 4) Fault tolerance in the consuming platform (= receiver)
>
> This I wouldn't immediately count into group scope. If the group can give the tools, everyone can build their fault tolerance into the platforms and / or the queries.
>
> 5) Producer and consumer variations
>
> Stream descriptions can help both in producer and consumer variations:
> - can be referenced by catalogues, which can help to manage changes in producers
> - can contain all format and prefix information to help consumers to join streams at any time.
>
>
> Just some initial thoughts on this.
>
> Cheers,
>
> Mikko
>
>
> On 5. Oct 2013, at 7:48 PM, Andy Seaborne wrote:
>
>> On the web, and indeed in any system of more than modest size and in one management domain, issues of
>>
>> * out of order delivery, including arbitrarily late arrival
>> * new stream producers coming online, old stream producers ending
>>      (discovery, joining, leaving)
>> * consumers joining and leaving
>> * streams becoming unavailable
>> * ... then restarting (with or without loss of potential events)
>>
>> leading to design points on
>>
>> + choice of timestamps
>> + delivery ordering semantics
>> + delivery guarantees (at least once, exactly once, at most once)
>> + persistence, and for how long
>>    (forwards, for guaranteed delivery and backwards for consumers to
>>     catch up).
>>
>> What are your thoughts on these issues?  In-scope or out-of-scope of the CG? Necessary or optimal to consider?
>>
>>  Andy
>>
>>
>
Received on Sunday, 6 October 2013 17:00:35 UTC