Re: [webrtc-encoded-transform] Add description of an API for controlling SDP codec negotiation (#186) from youennf via GitHub on 2023-10-16 (public-webrtc-logs@w3.org from October 2023)

From: youennf via GitHub <sysbot+gh@w3.org>
Date: Mon, 16 Oct 2023 09:42:31 +0000
To: public-webrtc-logs@w3.org
Message-ID: <issue_comment.created-1764102256-1697449348-sysbot+gh@w3.org>
> > What's the advantage or use case for choosing this per frame?

Fair question.

> I believe Youenn described this case in details on the call; I added this FAQ question specifically to address his use case.
> 
> From the explainer:
> 
> 1. Q: My application wants to send frames with multiple packetizers. How do I accomplish that?

The use case I know is enable/disable SFrame dynamically. Otherwise, we just want to stick to whatever packetization is associated to the encoder media content and we can already change the media content via `setParameters`.

This change can be done either at the frame level or at the transform level (similarly to `setParameters` really).
If at the transform level, you need to call `sender.transform = newTransform` to change whether SFrame packetization would be used or not. This is slightly less flexible but might be more convenient/less error prone to web developers and could allow some optimisations on the UA side.
You loose some flexibility, except if you plan to switch packetization at a very specific frame. I am unsure whether we have a strong use case here.

It is interesting to look at receiver side though, in case a UA implements the SFrame packetization format and processing happens in a script transform. The web application might want to know:
- packets were processed by the SFrame depacketizer
- Which underlying media decoder will be used (info provided by the SFrame depacketizer from the RTP payload content).
It might be convenient to expose an API for both things and it makes sense to expose this at the frame level, since this might change for every frame potentially.

I would tend to expose the same API receiver and sender side, hence why I would tend to go with frame level.

> Solving the SDP negotiation problem for built-in Sframe is not sufficient. We have to solve it for script transforms.

That is probably where we have a disconnect.
Script transforms have several potential use cases:
- Implement SFrame or a variant of SFrame. Once we have the packetization format in the UA, I do not see a need for a new generic mechanism to negotiate it. I am ok with SFrame packetization format spec to expose an extension point in the SDP, and we would expose this in our API (say `setCodecPreferences` for instance).
- Add metadata to media content. In this case, I believe we want encoded content to stick to the same format (e.g. H264 would stay H264 and metadata would be put in SEI). We do not need to change the packetization format and I do not see the need to negotiate anything in the SDP. RTP header extensions might be a more meaningful extension point if need be.
- Plug a new codec. It seems best to solve this with a separate API, I do not think we want to enshrine this kind of support in encoded transforms. For instance, a video encoder takes a VideoFrame as input and EncodedVideoChunk as output. A script transform takes an EncodedVideoChunk both for input and output. Also, these two should be usable together: it makes sense to use a SFrame transform on a plugged-in codec simply with `sender.transform = sframeTransform`.

This `plug a new codec` API would probably need to let the web application handle both the encoding/decoding (UA does not know the content format) and the associated packetization (UA does not know the format so does not know the associated packetization format), in addition to SDP negotiation. Relying on the SFrame packetization for new codecs is probably not something we want, as per IETF feedback.

For this `plug a new codec` work, I would tend to start with a `plug your own encoder` API that could be used without packetization/SDP handling, for instance to let web apps fine tune a WebCodecs encoder setup (set QP per frame/macro-block for instance). We could then address new formats on top of this API to handle SDP negotiation/RTP packetization.
It could also be used for the `metadata to media content` use case since web applications may often carry VideoFrame and associated metadata together.

-- 
GitHub Notification of comment by youennf
Please view or discuss this issue at https://github.com/w3c/webrtc-encoded-transform/pull/186#issuecomment-1764102256 using your GitHub account


-- 
Sent via github-notify-ml as configured in https://github.com/w3c/github-notify-ml-config
Received on Monday, 16 October 2023 09:42:32 UTC