Re: RTCDataChannel characteristics and failures -API description - from Michael Tuexen on 2014-09-06 (public-webrtc@w3.org from September 2014)

From: Michael Tuexen <Michael.Tuexen@lurchi.franken.de>
Date: Sat, 6 Sep 2014 16:51:03 +0200
To: Gunnar Hellstrom <gunnar.hellstrom@omnitor.se>
Cc: public-webrtc@w3.org
Message-Id: <90357724-B23D-47D9-BF85-E443294A393D@lurchi.franken.de>
On 06 Sep 2014, at 08:45, Gunnar Hellstrom <gunnar.hellstrom@omnitor.se> wrote:

> On 2014-09-05 23:47, Martin Thomson wrote:
>> On 5 September 2014 11:14, Gunnar Hellstrom <gunnar.hellstrom@omnitor.se> wrote:
>>> So, you expect that in most cases the retries for a reliable channel will
>>> spread over 30 seconds, and if still unsuccessful, the Association and all
>>> its channels will be aborted.
>> No, my point was that the association timers run longer than the
>> timers that govern liveness of a single path.  That means that an
>> association can survive a path failure if an alternative path is
>> found, and it seems like we have ample time to do that.  Some
>> implementations already attempt that process, though I can't speak to
>> the efficacy.
>> 
>>> Can you explain how you got that 30 seconds figure?
>> https://tools.ietf.org/html/draft-ietf-rtcweb-stun-consent-freshness
>> 
> Thanks.
> So, the normal case with default values for an SCTP reliable channel seems to be:
> 
> 1. SCTP retransmissions.
> The transmission retries will be done  at 1 , 3, 7, 15, 31, 63, 127, ... seconds after initial transmission
Correct.
> 2. SCTP heartbeats
> The SCTP heartbeats will be done asynchronously with the data transmission with 30 second intervals. Let us assume that it will be at 12, 42, 72, 102 seconds after data transmission.
Incorrect:
1. You are making the assumption that the HEARTBEATs are send if you have outstanding data
   that path. This is true for the FreeBSD implementation, but this is implementation
   specific and not the best choice. It will be changed such that you don't send HEARTBEATs
   when you have outstanding data.
2. HEARTBEATs are sent every HB.Interval + RTO. So it would be 12, 43, 75, 109, and so on
   if there is no connectivity anymore.
3. If quick failover is used, the HEARTBEATs would not take HB.Interval into account if
   the path is potentially failed. This makes failure detection for idle paths similar
   to the failure detection for non-idle paths. Something good, I think.
> 3. SCTP Max.Retrans
> We have max 10 retransmissions for WebRTCP use of SCTP. In this value is both failed data transmissions and failed heartbeats included.
> Thus this reason to consider data transmission failed would happen after 127 seconds with 7 data retransmissions and 3 missed heartbeats. ( my earlier calculation of 500 seconds did not take the heartbeat into consideration)
> When this happens, the association breaks and all data channels within it are closed.
Assuming that you don't send HEARTBEATs when you have outstanding data, you get (if compute correctly):

Case 1: Non idle path
Non responded transmissions at 0, 1, 3, 7, 15, 31, 63, 123, 183, 243, 303 and the association will fail after 363 seconds.

Case 2a: Idle path, no quick failover
Non responded HEARTBEAT transmissions at 0, 31, 63, 97, 135, 181, 243, 333, 423, 513, 603 and the association will fail after 693 seconds

Case 2b: Idle path, quick failover
Non responded HEARTBEAT transmissions at 0, 31, 33, 37, 45, 61, 93, 153, 213, 273, 333 and the association will fail after 393 seconds.

As you see, when using quick failover, the difference in the detection time of communication loss between an idle path
and an non-idle path is one HB.Interval. If quick failover is not used, the difference is (Max.assoc.retrans + 1) * HB.Interval.

Quick Failover is specified in
http://tools.ietf.org/html/draft-ietf-tsvwg-sctp-failover-05

Best regards
Michael
> 4. ICE connectivity
> In parallell with that, the ICE consent connectivity check is performed with a mean interval of 5 seconds, and if all of them fail within 30 seconds, the connection shall close. So, that is 6 consecutive transmissions of the ICE consent that would have failed.
> 
> It is quite likely that we in this case have so bad network that the ICE liveliness test will fail before the combined retransmission and heartbeat failure. You say that ICE connectivity failure does not directly cause association abort. It will depend on the implementation if the path is refreshed. Would then the remaining RTCDataChannel retransmissions and hearbeats be done over that new path?  Will not the normal behavior for the RTCDataChannel be to close all channels on that connection at the ICE liveliness check failure?
> 
> 
> I think that this information is piling up to something that would be of value to insert as an informational box in the API specification with a title: Example of retransmissions and connectivity checks for RTCDataChannel.
> 
> 
> /Gunnar
> 
>
Received on Saturday, 6 September 2014 14:51:31 UTC