Re: WebCodecs - Incoming liaison letter from 3GPP SA4

Thanks François and Dale.

I have drafted a reply based on the discussion here:

https://docs.google.com/document/d/1PHiiol3vp_mARqYk1EYsVA7G2aGyUwFOZ7dxW4ZZQxg/edit

Paul and Youenn, please review, I'm happy to update this as needed.

We also have a PR open against the WebCodecs Codec Registry: https://github.com/w3c/webcodecs/pull/921


I'd like to merge this, if the group agrees, before we reply to 3GPP.

The proposed change includes a clarification, that not only is WG consensus needed to add a registry entry, there must also be implementer interest. This seems to be a useful clarification to add to all our registries (WebCodecs, MSE, EME), so I'd like to ask the WG's feedback on that suggestion.

Thanks,

Chris


________________________________
From: Dale Curtis <dalecurtis@google.com>
Sent: 22 January 2026 00:28
To: Francois Daoust <fd@w3.org>
Cc: public-media-wg@w3.org <public-media-wg@w3.org>
Subject: Re: WebCodecs - Incoming liaison letter from 3GPP SA4


External: Think before clicking

On Thu, Jan 15, 2026 at 4:11 AM Francois Daoust <fd@w3.org<mailto:fd@w3.org>> wrote:


On 2026-01-07 00:52, Dale Curtis wrote:
Comments inline. Caveat: These are just Chrome's opinions and are not binding. Paul, Youenn feel free to jump in :)

 tl;dr: Lets be consistent across our interfaces if such functionality is needed.

The registry entry requirements can currently be interpreted to say that codec-specific extensions to dictionaries are restricted to the encoder config bits (that may have partly triggered their first question?):

"Where applicable, a registration specification may include a section describing extensions to VideoEncoderConfig or AudioEncoderConfig dictionaries."
https://www.w3.org/TR/webcodecs-codec-registry/#registration-entry-requirements


This thread suggests that, in the future, there could perhaps be a need for codec-specific extensions to AudioDecoderConfig or VideoDecoderConfig as well. Also, the requirement seems a bit off as we already have registrations that extend VideoEncoderEncodeOptions (e.g., VP9).

To make the requirements more future proof, would it make sense to reformulate that requirement as:

"Where applicable, a registration specification may include a section describing extensions to the dictionaries used in the configure(), decode() and encode() methods of the decoder and encoder interfaces (e.g., AudioDecoderConfig, VideoDecoderConfig, AudioEncoderConfig, VideoEncoderConfig, VideoEncoderEncodeOptions)."

... or something like that :)

Yes, something to this effect makes sense to me. I'm not sure how much value there is in pre-emptively future proofing though. YAGNI and all that :)


François.


- dale

On Tue, Jan 6, 2026 at 8:00 AM Francois Daoust <fd@w3.org<mailto:fd@w3.org>> wrote:
Hello Media Working Group participants,

W3C received a liaison letter for the Media Working Group from the 3GPP  TSG SA WG4 (SA4)  with questions on the WebCodecs API. See attached file for the full letter. I'm reproducing the questions in text format below to ease review.

[[
IVAS codec is designed to handle multiple spatial input formats (Multi-channel / Objects / Ambisonics / Parametric (Metadata Assisted Spatial Audio, MASA)) and the IVAS decoder integrates a renderer which is capable of rendering to various output formats like binaural and loud-speaker layouts. Time aligned metadata serves a critical function in both the encoder and decoder. In IVAS Encoder, time aligned metadata is needed for encoding object-based audio, MASA and combined formats. Similarly, IVAS Decoder can ingest head orientation sensor metadata to render a fully immersive head-tracked binaural audio experience. The decoder may also need to manage Processing Information metadata when available in an RTP payload to configure the rendering.

AudioDecoder Interface
-----
- Currently the configure() API defines AudioDecoderConfig to only allow for the description field, a sequence of codec specific bytes, which commonly maps to the initial one-time setup of various codecs (e.g. Audio Specific Config in AAC, header/codebook data in Vorbis, etc). However, it does not provide a mechanism for any run-time controls for the decoder (e.g. an output format to configure decoded output). The AudioEncoder interface allows for codec-specific extensions to AudioEncoderConfig as a dictionary of configurations, while AudioDecoderConfig doesn’t.
Question: What is the correct way to implement run-time controls like the output format (e.g., BINAURAL, Stereo (2.0), 5.1.2, 7.1.4) in the decoder?

For set-once type controls, adding codec-specific extensions to the config object seems like the correct approach. Technically you could continue to stuff everything in the description. It's already an opaque codec specific blob, but that's not very usable if clients are expecting to tweak it.


- Time-varying metadata input for the decoder, either from the device sensors (e.g. user head orientation) or with out-of-band signalling (e.g. from RTP / Media Container), may be needed for the proper integration of IVAS’s immersive features, however currently there is no direct way to provide this to the decode() API.
Questions: How can time-aligned codec-specific metadata be injected into the decode() call without adding multiple configure() calls per frame?  If multiple configure() calls are used per frame, would the configure and decode calls be synchronously processed?

That depends on if the decoder itself is using the metadata or if it's just something that gets mapped to an output. If the decoder needs the metadata, adding something equivalent to the VideoEncoderEncodeOptions passed to encode() but on the decoder seems reasonable. If instead, it's just for passing to outputs the client can already handle that mapping based on timestamps (assuming 1:1 input/output).


- A decoder might need to produce additional time-aligned codec-specific metadata (e.g. object metadata) as output when an external rendering is desired. Currently the AudioDataOutputCallback only allows for an AudioData interface, but there is no way for the decode() API to produce any additional metadata output. This contrasts with the EncodedAudioChunkOutputCallback which implements an optional the EncodedAudioChunkMetadata field.
Question: How can the AudioDecoder interface provide time-aligned metadata and PCM audio output?

It would depend on if this time aligned metadata should also be given to a hypothetical encoder. If it is, then adding metadata fields to the encoded chunk would make sense from a symmetry perspective. Otherwise, some form of metadata struct like the encoders use seems reasonable.


AudioEncoder Interface
-----
-  Unlike the decoder, the encoder configure() API allows for codec-specific extensions to AudioEncoderConfig, however, for certain immersive formats like object-based audio or MASA, the encoder requires additional time varying metadata that is time-aligned with the input audio.
Question: How can such metadata input be injected into the Audio Encoder in a synchronous manner to ensure time-aligned application within the encoder?
]]

Adding an equivalent of VideoEncoderEncodeOptions to AudioEncoder::encode() seems reasonable for this type of thing.

Received on Thursday, 29 January 2026 11:09:50 UTC