[webrtc-nv-use-cases] Introducing app content-decisions at the encoded video level isn't webby (#101)

jan-ivar has just created a new issue for https://github.com/w3c/webrtc-nv-use-cases:

== Introducing app content-decisions at the encoded video level isn't webby ==
> [§ 3.10.2](https://w3c.github.io/webrtc-nv-use-cases/#stored-encoded-media) Transmitting stored encoded media ...
> - "Wait signals" or "your message is important to us, please stay on the line", inserted prior to switching to a live interactive stream.
> - Insertion of announcements or alarm signals in otherwise live streams.
> - Insertion of static content (such as profile pictures) when the sender temporarily disables a camera.

These kinds of features already exist today, and are implemented client-side using existing web technology.

Moving such app decisions to the sender, seems like a step backwards, reminiscent of dial-tone and [DTMF](https://w3c.github.io/webrtc-pc/#dom-RTCDTMFSender-insertDTMF), whose justifications were non-web-client recipients (and from another era).

Even if we can produce compelling use cases that hinge on video being switched out sender-side, we have [replaceTrack](https://w3c.github.io/webrtc-pc/#dom-rtcrtpsender-replacetrack) for that already, and [VideoTrackGenerator](https://w3c.github.io/mediacapture-transform/#video-track-generator) to bring in other sources of media. Operating on decoded media is simple, well-supported and allows local playback (self-view) of what is being sent. Re-encoding also ensures uniformity and scalability (e.g. SVC encoding).

Moving this logic to the encoded layer would seem to require significant API changes, which don't seem justified merely to optimize out a decoding/re-encoding step from time to time.

E.g. I can imagine a use case of an online teacher wishing the audience to see a training video instead of the teacher, but in this case, the better app IMHO is the one that sends the video using state-of-the-art tech for this (e.g. MSE, giving audience members/end-users the ability to pause/resume), not the one that encodes it into the WebRTC camera stream to simplify the life for the web developer dealing with a unified RTCPeerConnection API, at a cost to end-users.

> [§ 3.10.3](https://w3c.github.io/webrtc-nv-use-cases/#decode-encoded-media) Decoding pre-encoded media ... we have pre-encoded media (either dynamically generated or stored) that we wish to process in the same way as one processes media coming in over a PeerConnection

The "we" here seems to refer to web developers, not end-users. It therefore infers no benefits to end-users, and is not an acceptable use case to me. Moreover, even for web developers, it's not clear what benefits, if any, come from treating non-WebRTC media as WebRTC media. An RTCPeerConnection outputs a [MediaStreamTrack](https://www.w3.org/TR/mediacapture-streams/#dom-mediastreamtrack) which seems a versatile enough integration point.

Please view or discuss this issue at https://github.com/w3c/webrtc-nv-use-cases/issues/101 using your GitHub account


-- 
Sent via github-notify-ml as configured in https://github.com/w3c/github-notify-ml-config

Received on Tuesday, 24 January 2023 23:33:39 UTC