Re: Streams in an unreliable world.

I'm interjecting a few of my opinions interlinearly.

-- Jim

On Mon, Oct 7, 2013 at 2:09 PM, Emanuele Della Valle <
emanuele.dellavalle@polimi.it> wrote:

> Dear Andy,
>
> thank you for this post. I answer in line below.
>
> On Oct 5, 2013, at 6:48 PM, Andy Seaborne <andy@apache.org>
>  wrote:
>
> > On the web, and indeed in any system of more than modest size and in one
> management domain, issues of
> >
> > * out of order delivery, including arbitrarily late arrival
>
> I believe that out of order is in scope, but not arbitrarily late arrival.
> We should allow for a maximum delay that guarantees for a minimum quality
> of answers. In many relevant use cases I have in mind reactiveness is more
> important than correctness, because either an answer arrive within a given
> time limit or it has no value. If we allow for arbitrarily late arrival, we
> cannot meet the reactivity requirement even if enough information arrived
> for a good enough answer.
>


Designing for arbitrarily late arrival should be okay as long as we point
out the tradeoffs between late arrival and timeliness. By "arbitrary," I
mean that the recommendation coming out of the CG should not have explicit
limits, but discuss how limits can be implemented, recognized, etc., with
the cutoff being left up to the stream consumer.

Timeliness and delivery order are, of course, within the context of the
standard web protocols since this is a W3C group. I'd consider RDF streams
over UDP to be out of scope for the W3C (even if RDF streams are
constructed from data coming in via UDP).



>
> > * new stream producers coming online, old stream producers ending
> >     (discovery, joining, leaving)
>
> this is in scope and I believe it should be discuss under the topic RSP
> services.
>
> > * consumers joining and leaving
>
> same as above
>
> > * streams becoming unavailable
>
> this appears to be a difficult to handle point. What's the difference
> between a silent stream and an unavailable stream? However, I'd like to
> discuss this requirement.
>
>

I'd say that an unavailable stream should return a status of 400 or above.
A silent stream is one that exists (status < 400), but does not have data
at the present time.



> > * ... then restarting (with or without loss of potential events)
>
> in scope, even if, as for out of order delivery, I would suggest to
> remember how central to RDF stream processing is the reactivity
> requirement. Making sure to deliver an answer, which has no longer any
> value, may be irrelevant. Systems may even decide to drop events without
> processing them.
>


I'd suggest we look at some of the other streaming or stream-like protocols
already in use on the web, such as WebSocket, for ideas on how to approach
stream interruption and restarting. Or we could build on top of WebSocket
since we're working within the web platform.



> > + delivery guarantees (at least once, exactly once, at most once)
>
> interesting...
>


In other words, is deliver idempotent like a GET, PUT, or DELETE, or not,
like a POST? My vote is for deliver to be idempotent, mirroring the
behavior of asserting the same triple multiple times.

Received on Monday, 7 October 2013 18:31:32 UTC