Issue 174: When is an IceGatherer allowed to prune host, reflexive and relay candidates?

Issue:
An IceGatherer is responsible for gathering potential host, reflexive and relay candidates but it's unclear when host, reflexive or relay candidates can/should be pruned because outstanding "offers" to connect might exist at any point in time.

Scenario A:
1) Alice sends out her IceGatherer candidates where 1 to N remote sessions might reply back (i.e. forking)
2) Alice begins connectivity checks with the first remote session where local candidates are eliminated when checks to remote candidates fail (or are deemed of less priority).

Scenario B:
1) Alice had a single IceGatherer with multiple IceTransports connected with several remote IceTransports (i.e. forking)
2) Another fork appears (or an existing fork goes down and needs a restart), the IceGatherer need to warm up again and gather a new set of host, reflexive and relay candidates to exchange with the new fork for an unknown period of time.

Concerns:
a) Alice cannot prune candidates from the IceGatherer based upon the elimination of candidates from the first session because a second session might be successful with those same local candidates;
b) Alice cannot even reasonably be sure new remote candidates might arrive a bit later on an existing IceTransport where those candidates (host, reflexive or relay candidates) might be required for connectivity against these new remote candidates (i.e. "end of remote candidates (for now)" is unclear);
c) Keeping all host, reflexive and relay candidates "hot" continuously is not desirable because it causes firewall pinholes to remain needlessly active, TURN resources to be consumed, and expensive interfaces like 3G/4G drain mobile batteries for the sake of STUN/TURN maintenance packets;
d) Creating a new IceGatherer per forked IceTransport is needlessly wasteful when a single IceGatherer would be sufficient;

Possible solutions:
(a) IceGatherer is told to "gather()" with a prune timeout, and candidates remain warm until the timeout has completed (and can be optionally extended by gather() again with another future timeout before the last timeout completes).
(b) IceGatherer keeps all host, reflexive and relay candidates "hot" until a "IceGatherer.prune()" method is called where it's now allowed to prune out any non-warm candidates (i.e. candidates that have not sent or received checks / data within a standard ICE consent timeout).
(c) IceGatherer keeps all host, reflexive and relay candidates "hot" so long as a single incomplete IceTransport is attached to the IceGatherer, i.e. keeping the IceGatherer warm could be done by keeping an unused IceTransport around until it's disposed.

I personally prefer solution (a) or (b). The good thing about (a) is that the IceGatherer will prune at some point if nothing is done and can also be extended and the remote side can be guarantee a timeframe for which received candidates are valid. But (a) also has a bad point of not being able to prune at absolute time points (although gather() with immediate timeout would accomplish this). The API surface would require IceGatherer.gather() to accept an optional timeout parameter.

Solution (b) allows for absolute control of when IceGatherer candidates can be pruned but pruning requires active intervention or candidates will remain needlessly "hot". The API surface would need an additional IceGatherer.prune() method.

Solution (c) is beneficial that the API surface is not expanded. The drawback is that it feels like a bit of a hack to keep an IceTransport alive for the sake of keeping candidates "hot" [although that is debatable]. The larger drawback is that it's unclear when an IceTransport is truly complete unless we add a "this is the final remote candidate expected [for now]" notification to the IceTransport so it can know that it's now safe to prune candidates. Another draw back would be that an IceTransport object might not be immediately disposed because of garbage collection nor would necessarily an IceTransport.stop() be called if a IceTransport.start() was never called in the first place.


-Robin

Received on Friday, 6 February 2015 17:15:31 UTC