Re: Streams in an unreliable world.

Dear Jim and all,

you find my comments in-line.

On Oct 7, 2013, at 8:31 PM, James Smith <jgsmith@gmail.com<mailto:jgsmith@gmail.com>>
 wrote:

I'm interjecting a few of my opinions interlinearly.

-- Jim

On Mon, Oct 7, 2013 at 2:09 PM, Emanuele Della Valle <emanuele.dellavalle@polimi.it<mailto:emanuele.dellavalle@polimi.it>> wrote:
Dear Andy,

thank you for this post. I answer in line below.

On Oct 5, 2013, at 6:48 PM, Andy Seaborne <andy@apache.org<mailto:andy@apache.org>>
 wrote:

> On the web, and indeed in any system of more than modest size and in one management domain, issues of
>
> * out of order delivery, including arbitrarily late arrival

I believe that out of order is in scope, but not arbitrarily late arrival. We should allow for a maximum delay that guarantees for a minimum quality of answers. In many relevant use cases I have in mind reactiveness is more important than correctness, because either an answer arrive within a given time limit or it has no value. If we allow for arbitrarily late arrival, we cannot meet the reactivity requirement even if enough information arrived for a good enough answer.


Designing for arbitrarily late arrival should be okay as long as we point out the tradeoffs between late arrival and timeliness. By "arbitrary," I mean that the recommendation coming out of the CG should not have explicit limits, but discuss how limits can be implemented, recognized, etc., with the cutoff being left up to the stream consumer.

Ok, then I agree.

Timeliness and delivery order are, of course, within the context of the standard web protocols since this is a W3C group. I'd consider RDF streams over UDP to be out of scope for the W3C (even if RDF streams are constructed from data coming in via UDP).

I would also consider UPD transport out of scope




> * new stream producers coming online, old stream producers ending
>     (discovery, joining, leaving)

this is in scope and I believe it should be discuss under the topic RSP services.

> * consumers joining and leaving

same as above

> * streams becoming unavailable

this appears to be a difficult to handle point. What's the difference between a silent stream and an unavailable stream? However, I'd like to discuss this requirement.



I'd say that an unavailable stream should return a status of 400 or above. A silent stream is one that exists (status < 400), but does not have data at the present time.


mmm, I believe we are in a push architecture. RSP should not *pull* data from RDF stream sources, but rather the RDF stream source should *push* data to the RSP. This issue may have been discussed for WebSockets already, but I'm not familiar with such a technology.



> * ... then restarting (with or without loss of potential events)

in scope, even if, as for out of order delivery, I would suggest to remember how central to RDF stream processing is the reactivity requirement. Making sure to deliver an answer, which has no longer any value, may be irrelevant. Systems may even decide to drop events without processing them.


I'd suggest we look at some of the other streaming or stream-like protocols already in use on the web, such as WebSocket, for ideas on how to approach stream interruption and restarting. Or we could build on top of WebSocket since we're working within the web platform.

Thank for this proposal. I will have a look.



> + delivery guarantees (at least once, exactly once, at most once)

interesting...


In other words, is deliver idempotent like a GET, PUT, or DELETE, or not, like a POST? My vote is for deliver to be idempotent, mirroring the behavior of asserting the same triple multiple times.


I agree, it will be nice, but I'm not sure it is possible. In our recent work on the topic (see attached poster paper), we used POST to stream RDF graphs on a RDF stream. A PUT sounded less natural when we designed the protocol, but if you have other ideas, I will be happy to hear them and understand their rationals.

Bests,

Emanuele

Received on Tuesday, 8 October 2013 08:03:09 UTC