Re: Streams in an unreliable world. from Andy Seaborne on 2013-10-09 (public-rsp@w3.org from October 2013)

From: Andy Seaborne <andy@apache.org>
Date: Wed, 09 Oct 2013 14:46:13 +0100
To: public-rsp@w3.org
Message-ID: <52555E25.8080403@apache.org>
On 08/10/13 09:02, Emanuele Della Valle wrote:
> Dear Jim and all,
>
> you find my comments in-line.
>
> On Oct 7, 2013, at 8:31 PM, James Smith <jgsmith@gmail.com 
> <mailto:jgsmith@gmail.com>>
>  wrote:
>
>> I'm interjecting a few of my opinions interlinearly.
>>
>> -- Jim
>>
>> On Mon, Oct 7, 2013 at 2:09 PM, Emanuele Della Valle 
>> <emanuele.dellavalle@polimi.it 
>> <mailto:emanuele.dellavalle@polimi.it>> wrote:
>>
>>     Dear Andy,
>>
>>     thank you for this post. I answer in line below.
>>
>>     On Oct 5, 2013, at 6:48 PM, Andy Seaborne <andy@apache.org
>>     <mailto:andy@apache.org>>
>>      wrote:
>>
>>     > On the web, and indeed in any system of more than modest size
>>     and in one management domain, issues of
>>     >
>>     > * out of order delivery, including arbitrarily late arrival
>>
>>     I believe that out of order is in scope, but not arbitrarily late
>>     arrival. We should allow for a maximum delay that guarantees for
>>     a minimum quality of answers. In many relevant use cases I have
>>     in mind reactiveness is more important than correctness, because
>>     either an answer arrive within a given time limit or it has no
>>     value. If we allow for arbitrarily late arrival, we cannot meet
>>     the reactivity requirement even if enough information arrived for
>>     a good enough answer.
>>
>>
>>
>> Designing for arbitrarily late arrival should be okay as long as we 
>> point out the tradeoffs between late arrival and timeliness. By 
>> "arbitrary," I mean that the recommendation coming out of the CG 
>> should not have explicit limits, but discuss how limits can be 
>> implemented, recognized, etc., with the cutoff being left up to the 
>> stream consumer.
>
> Ok, then I agree.
>
>> Timeliness and delivery order are, of course, within the context of 
>> the standard web protocols since this is a W3C group. I'd consider 
>> RDF streams over UDP to be out of scope for the W3C (even if RDF 
>> streams are constructed from data coming in via UDP).
>
> I would also consider UPD transport out of scope
>
>>
>>
>>     > * new stream producers coming online, old stream producers ending
>>     >     (discovery, joining, leaving)
>>
>>     this is in scope and I believe it should be discuss under the
>>     topic RSP services.
>>
>>     > * consumers joining and leaving
>>
>>     same as above
>>
>>     > * streams becoming unavailable
>>
>>     this appears to be a difficult to handle point. What's the
>>     difference between a silent stream and an unavailable stream?
>>     However, I'd like to discuss this requirement.
>>
>>
>>
>> I'd say that an unavailable stream should return a status of 400 or 
>> above. A silent stream is one that exists (status < 400), but does 
>> not have data at the present time.


Yes, if a pull stream, or heartbeats if a push stream.

A silent stream can be differentiated from an unavailable stream by a 
heartbeat in the stream.  In fact the difference could be very important 
to some applications e.g. sensors (e.g. smoke detectors!).

Or there may be other reason for a control channel and a data channel 
and then it does not matter whether the heartbeat is one or the other.


> mmm, I believe we are in a push architecture. RSP should not *pull* 
> data from RDF stream sources, but rather the RDF stream source should 
> *push* data to the RSP. This issue may have been discussed for 
> WebSockets already, but I'm not familiar with such a technology.

Pull vs push: both.

There are issues either way and it affects application design.  For 
example, a processor may effectively sampling a high frequency stream 
(current temperature, consumed on a once-an-hour basis), or a processor 
may want alerts (event of interest occured).

It would be possible to only cover one or the other styles but I think 
it is important to explicitly decide that and not slip into it 
implicitly.  Implicit decisions cause problems later when communicating 
the work.

>>     > * ... then restarting (with or without loss of potential events)
>>
>>     in scope, even if, as for out of order delivery, I would suggest
>>     to remember how central to RDF stream processing is the
>>     reactivity requirement. Making sure to deliver an answer, which
>>     has no longer any value, may be irrelevant. Systems may even
>>     decide to drop events without processing them.
>>
>>
>>
>> I'd suggest we look at some of the other streaming or stream-like 
>> protocols already in use on the web, such as WebSocket, for ideas on 
>> how to approach stream interruption and restarting. Or we could build 
>> on top of WebSocket since we're working within the web platform.
>
> Thank for this proposal. I will have a look.
>
>>
>>     > + delivery guarantees (at least once, exactly once, at most once)
>>
>>     interesting...
>>
>>
>>
>> In other words, is deliver idempotent like a GET, PUT, or DELETE, or 
>> not, like a POST? My vote is for deliver to be idempotent, mirroring 
>> the behavior of asserting the same triple multiple times.
>
>
> I agree, it will be nice, but I'm not sure it is possible. In our 
> recent work on the topic (see attached poster paper), we used POST to 
> stream RDF graphs on a RDF stream. A PUT sounded less natural when we 
> designed the protocol, but if you have other ideas, I will be happy to 
> hear them and understand their rationals.
>
> Bests,
>
> Emanuele
>

     Andy
Received on Wednesday, 9 October 2013 13:46:45 UTC