Re: [webrtc-encoded-transform] Remove restriction on streams being limited to only one PC (#201) from guidou via GitHub on 2023-10-23 (public-webrtc-logs@w3.org from October 2023)

From: guidou via GitHub <sysbot+gh@w3.org>
Date: Mon, 23 Oct 2023 13:01:08 +0000
To: public-webrtc-logs@w3.org
Message-ID: <issue_comment.created-1775141757-1698066065-sysbot+gh@w3.org>

> I thought about this approach a while ago. It is cleaner in the sense that it clearly states to the UA that (focusing on sender side) the JS application is responsible to implement the source+encoder part (which late fanout is clearly about). As an example, it will help UA provide meaningful WebRTC stats. Setting a track on such a sender could throw...

Can this be addressed by passing an extra optional parameter to the RTCRtpScriptTransform constructor?

> Exposing such API requires us to refine the WebRTC media pipeline model as we would expose things that are fully internal right now. This might not be unrelated to the TPAC Media/WebRTC WG discussion meeting I missed.

Harald's congestion control proposal is in line with this. https://github.com/w3c/webrtc-encoded-transform/pull/207
Also, depending on the use case this might or might not be an issue.  There is a lot of value in optionally lifting the restriction to support valid use cases.
 
> Related to this approach, my first questions would be:

> The frame level vs. packet level question. It would be great to see whether we can get consensus within the WG. For instance, if we have a packet level API, do we also need a frame level API? Do we need both?
@jan-ivar's question: should fanout be done in JS or done by UA with web app tuning it via knobs?

Some advantages of the frame-level approach:
1. The changes required to solve the problem (including glitch-free failover for multiple input PCs) at the frame level are minimal: add a setMetadata() method to RTCEncodedVideo/AudioFrame, and allow removing the 1PC restriction.  We have experimented with this in Chrome (which implements the pre-standard version that doesn't have the 1PC restriction) and the results are very good in some real-world scenarios. 
2. The programming model is a small extension to the well-understood model of encoded transform.
3. It makes it easier to support other transport protocols if necessary.
4. The concerns about exposing internal aspects of the WebRTC pipeline do not apply to all use cases and can be addressed with specific APIs for that purpose.

A packet-level API, depending on how it's made, can have the advantage of allowing forwarding to start without waiting for the whole frame, and is able to deal with packet loss directly, but has other disadvantages for this use case, such as:
1. It is a new model, and it is likely to be less straightforward to support glitch-free failover.
2. It is possible to use a frame-to-packetizer model for sending (as in [1]), in which case the forwarding advantage does not exist.
3.  It is a major API intended to support multiple use cases, which would require plenty of time to design, build consensus, implement and deploy. It's not clear that in the end the solution for this use case will necessarily be better than the frame-based solution. It will for sure be less timely.
4. It is tied to RTP, which is OK in many cases, but some users might prefer a solution not tied to the transport protocol as they might consider migrating in the future to alternative protocols (e.g., Quic).


WDYT?

[1]  https://docs.google.com/presentation/d/1vqlz0vbF1JFmKxVfTrhpBwomwu1aiSJz5p-Q0PXNDKA/edit#slide=id.g27b84bfbaa3_13_374) 

-- 
GitHub Notification of comment by guidou
Please view or discuss this issue at https://github.com/w3c/webrtc-encoded-transform/pull/201#issuecomment-1775141757 using your GitHub account


-- 
Sent via github-notify-ml as configured in https://github.com/w3c/github-notify-ml-config

Received on Monday, 23 October 2023 13:01:10 UTC