Re: WebCodecs - Incoming liaison letter from 3GPP SA4

On Thu, Jan 15, 2026 at 4:11 AM Francois Daoust <fd@w3.org> wrote:

>
>
> On 2026-01-07 00:52, Dale Curtis wrote:
>
> Comments inline. Caveat: These are just Chrome's opinions and are not
> binding. Paul, Youenn feel free to jump in :)
>
>  tl;dr: Lets be consistent across our interfaces if such functionality is
> needed.
>
>
> The registry entry requirements can currently be interpreted to say that
> codec-specific extensions to dictionaries are restricted to the encoder
> config bits (that may have partly triggered their first question?):
>
> "Where applicable, a registration specification may include a section
> describing extensions to VideoEncoderConfig or AudioEncoderConfig
> dictionaries."
>
> https://www.w3.org/TR/webcodecs-codec-registry/#registration-entry-requirements
>
> This thread suggests that, in the future, there could perhaps be a need
> for codec-specific extensions to AudioDecoderConfig or VideoDecoderConfig
> as well. Also, the requirement seems a bit off as we already have
> registrations that extend VideoEncoderEncodeOptions (e.g., VP9).
>
> To make the requirements more future proof, would it make sense to
> reformulate that requirement as:
>
> "Where applicable, a registration specification may include a section
> describing extensions to the dictionaries used in the configure(), decode()
> and encode() methods of the decoder and encoder interfaces (e.g.,
> AudioDecoderConfig, VideoDecoderConfig, AudioEncoderConfig,
> VideoEncoderConfig, VideoEncoderEncodeOptions)."
>
> ... or something like that :)
>

Yes, something to this effect makes sense to me. I'm not sure how much
value there is in pre-emptively future proofing though. YAGNI and all that
:)


>
> François.
>
>
> - dale
>
> On Tue, Jan 6, 2026 at 8:00 AM Francois Daoust <fd@w3.org> wrote:
>
>> Hello Media Working Group participants,
>>
>> W3C received a liaison letter for the Media Working Group from the
>> 3GPP  TSG SA WG4 (SA4)  with questions on the WebCodecs API. See attached
>> file for the full letter. I'm reproducing the questions in text format
>> below to ease review.
>>
>> [[
>> IVAS codec is designed to handle multiple spatial input formats
>> (Multi-channel / Objects / Ambisonics / Parametric (Metadata Assisted
>> Spatial Audio, MASA)) and the IVAS decoder integrates a renderer which is
>> capable of rendering to various output formats like binaural and
>> loud-speaker layouts. Time aligned metadata serves a critical function in
>> both the encoder and decoder. In IVAS Encoder, time aligned metadata is
>> needed for encoding object-based audio, MASA and combined formats.
>> Similarly, IVAS Decoder can ingest head orientation sensor metadata to
>> render a fully immersive head-tracked binaural audio experience. The
>> decoder may also need to manage Processing Information metadata when
>> available in an RTP payload to configure the rendering.
>>
>> AudioDecoder Interface
>> -----
>> - Currently the configure() API defines AudioDecoderConfig to only allow
>> for the description field, a sequence of codec specific bytes, which
>> commonly maps to the initial one-time setup of various codecs (e.g. Audio
>> Specific Config in AAC, header/codebook data in Vorbis, etc). However, it
>> does not provide a mechanism for any run-time controls for the decoder
>> (e.g. an output format to configure decoded output). The AudioEncoder
>> interface allows for codec-specific extensions to AudioEncoderConfig as a
>> dictionary of configurations, while AudioDecoderConfig doesn’t.
>> Question: What is the correct way to implement run-time controls like the
>> output format (e.g., BINAURAL, Stereo (2.0), 5.1.2, 7.1.4) in the decoder?
>>
>
> For set-once type controls, adding codec-specific extensions to the config
> object seems like the correct approach. Technically you could continue to
> stuff everything in the description. It's already an opaque codec specific
> blob, but that's not very usable if clients are expecting to tweak it.
>
>
>>
>> - Time-varying metadata input for the decoder, either from the device
>> sensors (e.g. user head orientation) or with out-of-band signalling (e.g.
>> from RTP / Media Container), may be needed for the proper integration of
>> IVAS’s immersive features, however currently there is no direct way to
>> provide this to the decode() API.
>> Questions: How can time-aligned codec-specific metadata be injected into
>> the decode() call without adding multiple configure() calls per frame?  If
>> multiple configure() calls are used per frame, would the configure and
>> decode calls be synchronously processed?
>>
>
> That depends on if the decoder itself is using the metadata or if it's
> just something that gets mapped to an output. If the decoder needs the
> metadata, adding something equivalent to the VideoEncoderEncodeOptions
> passed to encode() but on the decoder seems reasonable. If instead, it's
> just for passing to outputs the client can already handle that mapping
> based on timestamps (assuming 1:1 input/output).
>
>
>>
>> - A decoder might need to produce additional time-aligned codec-specific
>> metadata (e.g. object metadata) as output when an external rendering is
>> desired. Currently the AudioDataOutputCallback only allows for an AudioData
>> interface, but there is no way for the decode() API to produce any
>> additional metadata output. This contrasts with the
>> EncodedAudioChunkOutputCallback which implements an optional the
>> EncodedAudioChunkMetadata field.
>> Question: How can the AudioDecoder interface provide time-aligned
>> metadata and PCM audio output?
>>
>
> It would depend on if this time aligned metadata should also be given to a
> hypothetical encoder. If it is, then adding metadata fields to the encoded
> chunk would make sense from a symmetry perspective. Otherwise, some form of
> metadata struct like the encoders use seems reasonable.
>
>
>>
>> AudioEncoder Interface
>> -----
>> -  Unlike the decoder, the encoder configure() API allows for
>> codec-specific extensions to AudioEncoderConfig, however, for certain
>> immersive formats like object-based audio or MASA, the encoder requires
>> additional time varying metadata that is time-aligned with the input audio.
>> Question: How can such metadata input be injected into the Audio Encoder
>> in a synchronous manner to ensure time-aligned application within the
>> encoder?
>> ]]
>>
>
> Adding an equivalent of VideoEncoderEncodeOptions to
> AudioEncoder::encode() seems reasonable for this type of thing.
>
>
>

Received on Thursday, 22 January 2026 00:28:29 UTC