Re: RTCDataChannel characteristics and failures from Michael Tuexen on 2014-01-03 (public-webrtc@w3.org from January 2014)

From: Michael Tuexen <Michael.Tuexen@lurchi.franken.de>
Date: Fri, 3 Jan 2014 10:34:36 +0100
To: Gunnar Hellstrom <gunnar.hellstrom@omnitor.se>
Cc: public-webrtc@w3.org
Message-Id: <F20D3403-BE99-4D7B-8EBF-AC18F52930DC@lurchi.franken.de>
On Jan 2, 2014, at 11:25 PM, Gunnar Hellstrom <gunnar.hellstrom@omnitor.se> wrote:

> On 2014-01-02 21:53, Michael Tuexen wrote:
>> On Jan 2, 2014, at 9:31 PM, Gunnar Hellstrom <gunnar.hellstrom@omnitor.se> wrote:
>> 
>>> On 2014-01-02 19:21, Michael Tuexen wrote:
>>>> On Jan 2, 2014, at 1:36 PM, piranna@gmail.com wrote:
>>>> 
>>>>> For example, highly coupled realtime tasks like P2P video distribution. You would be interested in have a timeout to don't send info out of time, but also have a max number of retransmits to don't waste bandwidth and try another source.
>>>> The source would apply the PR-SCTP policy, right? How does the source know that messages
>>>> are abandoned? The socket API for SCTP can provide this information, but how do you know
>>>> this in the JS API. Any idea? Who would change another source, since it can't be the source.
>>>> 
>>>> Currently SCTP uses an enumeration of policies. To do the above, we would need to
>>>> extend it to handle logical ands and ors of policies. That would required some
>>>> changes to
>>>> https://tools.ietf.org/search/draft-ietf-tsvwg-sctp-prpolicies-00
>>>> 
>>>> What do others think?
>>> I think the application can do the combination, by asking the channel to use the retry limit and the application having a timeout when it looks at the transmit queue, and if it has not moved the application can give up.
>>> 
>>> There is sufficient complication in the protocols and APIs to handle the current policies, and we are lacking information in the API description for what happens in different failure and abandoning situations.
>>> 
>>> When I have analyzed the data channel, I have come to the conclusion that the application needs to handle its own sequence number and sequence number checking. It first sounds as overkill, but I think it is not. The reliable and semi-reliable channel types may break by reaching the retransmission limit of SCTP, and the logical action of the application is to reconnect and continue. If detection of loss is of importance, then a sequence number that spans over a break and reconnect
>> Hmmm. Interesting. SCTP gives up when it thinks the peer can't be reached anymore.
>> With tuning the parameters, it will try pretty hard. Why do you think retrying the
>> connection makes things easier? Doesn't ICE do also connectivity checks? Why should
>> ICE report that the peer is reachable, but SCTP can't?
> I read SCTP as if it will do one original and a maximum of 5 retransmissions. In bad network conditions, it will not at all be eternity between failures of all 6 transmissions.
Well, there is a parameter called Association.Max.Retrans, which is the number of retransmissions
before an association is considered dead. The parameter you are referring to is Path.Max.Retrans.
After that many timeouts, the particular path is considered non-working. Using these parameters
in a single homed case results in a state where all paths (since you have only one) is dead, but
the association isn't yet. This is sometimes called the dormant state. Some SCTP implementation
give up the association when it enters the dormant state, others don't. I'll add some text that
in RTCWeb we should use a setting with Association.Max.Retrans = Path.Max.Retrans. We can
argue if it is 5, or 10 (which is more the intention of the RFC 4960) or whatever.
> A factor limiting the likelihood of success is that you rely on both the forward transmission and the feedback transmission to work in order to regard the transmission successful. So, you can get stuck retrying even if the receiver got the data.
> I know you said that it might not be the path limit of 5 retries that should be used but possibly imore towards the 10 Association retries. But that also increases the stall time during retransmissions so that at least the real-time text data I am thinking of would be severely stale when the channel finally succeeded or broke.
This is the normal tradeoff. However, I'm not sure if this is relevant at all. According to
http://tools.ietf.org/html/draft-ietf-rtcweb-stun-consent-freshness-00
we will teardown the peer connection after 15 seconds.
> 
> Already the 5 retries could span over 30 to 60 seconds on rapid networks, so I do not think it is realistic to extend the retry number over 5.
> 
> It of course all depends on what packet loss rate you need to calculate with, if you have a type of communication that need to work also in catastrophy situations.
> 
> My brief calculations indicate that in a bad situation with 15% packet loss, and the reliable channel used for rea-time text time sampled and transmitted at 300 ms intervals would experience more than 10 seconds stalling once per minute and break of the channel after 30 seconds stalling for retries would happen with a mean interval of 11 minutes.
> 
> What else can an application do than reconnect and offer the user to continue?
> 
> At up to about 3 % packet loss it seems to perform excellently and can really be called reliable.
> 
> A first action would be to include information in the API about how various kinds of failures are signaled.
> 
> And for the application of real-time text we still need to consider all the alternatives discussed about a year ago:
> RTP
> Reliable data channel
> Semi-reliable data channel with a timeout that the user can accept and reconnection for continuing.
> Unreliable data channel and forward error correction applied.
> Transmission external to webrtc, through WebSocket or so and proper action against long stalling in failure situations.
I'm not familiar with real time text. But for me it looks like you create messages (lines of text)
which you want to transmit reliably, but the messages (each line of text) changes over the time
it is written. So you have multiple generations of a message and a newer generation of the message
replaces any older one. This sounds like another PR-SCTP policy, where you send each message
with an ID and any message within the SCTP stack with the same ID would be abandoned. That
way you would only try to send the latest version of each line of text.
The receiver would always display the latest information it has.

Wouldn't that work?

Best regards
Michael
> 
> 
> A first action would be to include information in the API about how various kinds of failures are signaled, as I have made an initial proposal earlier in this thread.
> 
> Regards,
> 
> Gunnar
> 
> 
> 
>> 
>> Best regards
>> Michael
>>> seems needed.
>>> Or, is there any built-in support for that?
>>> 
>>> Gunnar
>>> 
>>> 
>>> Best regards Michael
>>>>> Send from my Samsung Galaxy Note II
>>>>> 
>>>>> El 02/01/2014 09:36, "Harald Alvestrand" <harald@alvestrand.no> escribió:
>>>>> On 12/30/2013 12:09 AM, piranna@gmail.com wrote:
>>>>> 
>>>>>>> for some use cases...
>>>>>> Any concrete example?
>>>>>> 
>>>>> Application-based fine-grain control of the transmissions.
>>>>> 
>>>>> When asked for a concrete example, please give a concrete example of an application that needs it.
>>>>> 
>>>>> It's always possible to send in unreliable mode and implement your own mechanisms for reliability; any extension to the spec needs to be backed with an use case that makes it clear why it's worth changing.
>>>>> 
>>>>> 
>>> 
>>> 
>> 
> 
> 
>
Received on Friday, 3 January 2014 09:35:06 UTC