- From: Francois Daoust <fd@w3.org>
- Date: Tue, 6 Jan 2026 16:59:56 +0100
- To: public-media-wg@w3.org
- Message-ID: <12d7b7d8-ad1b-4e3b-ac58-1e118a4c8adc@w3.org>
Hello Media Working Group participants, W3C received a liaison letter for the Media Working Group from the 3GPP TSG SA WG4 (SA4) with questions on the WebCodecs API. See attached file for the full letter. I'm reproducing the questions in text format below to ease review. [[ IVAS codec is designed to handle multiple spatial input formats (Multi-channel / Objects / Ambisonics / Parametric (Metadata Assisted Spatial Audio, MASA)) and the IVAS decoder integrates a renderer which is capable of rendering to various output formats like binaural and loud-speaker layouts. Time aligned metadata serves a critical function in both the encoder and decoder. In IVAS Encoder, time aligned metadata is needed for encoding object-based audio, MASA and combined formats. Similarly, IVAS Decoder can ingest head orientation sensor metadata to render a fully immersive head-tracked binaural audio experience. The decoder may also need to manage Processing Information metadata when available in an RTP payload to configure the rendering. AudioDecoder Interface ----- - Currently the configure() API defines AudioDecoderConfig to only allow for the description field, a sequence of codec specific bytes, which commonly maps to the initial one-time setup of various codecs (e.g. Audio Specific Config in AAC, header/codebook data in Vorbis, etc). However, it does not provide a mechanism for any run-time controls for the decoder (e.g. an output format to configure decoded output). The AudioEncoder interface allows for codec-specific extensions to AudioEncoderConfig as a dictionary of configurations, while AudioDecoderConfig doesn’t. Question: What is the correct way to implement run-time controls like the output format (e.g., BINAURAL, Stereo (2.0), 5.1.2, 7.1.4) in the decoder? - Time-varying metadata input for the decoder, either from the device sensors (e.g. user head orientation) or with out-of-band signalling (e.g. from RTP / Media Container), may be needed for the proper integration of IVAS’s immersive features, however currently there is no direct way to provide this to the decode() API. Questions: How can time-aligned codec-specific metadata be injected into the decode() call without adding multiple configure() calls per frame? If multiple configure() calls are used per frame, would the configure and decode calls be synchronously processed? - A decoder might need to produce additional time-aligned codec-specific metadata (e.g. object metadata) as output when an external rendering is desired. Currently the AudioDataOutputCallback only allows for an AudioData interface, but there is no way for the decode() API to produce any additional metadata output. This contrasts with the EncodedAudioChunkOutputCallback which implements an optional the EncodedAudioChunkMetadata field. Question: How can the AudioDecoder interface provide time-aligned metadata and PCM audio output? AudioEncoder Interface ----- - Unlike the decoder, the encoder configure() API allows for codec-specific extensions to AudioEncoderConfig, however, for certain immersive formats like object-based audio or MASA, the encoder requires additional time varying metadata that is time-aligned with the input audio. Question: How can such metadata input be injected into the Audio Encoder in a synchronous manner to ensure time-aligned application within the encoder? ]]
Attachments
- application/zip attachment: S4-252109.zip
Received on Tuesday, 6 January 2026 15:59:58 UTC