- From: Dale Curtis <dalecurtis@google.com>
- Date: Tue, 6 Jan 2026 15:52:20 -0800
- To: Francois Daoust <fd@w3.org>
- Cc: public-media-wg@w3.org
- Message-ID: <CAPUDrwe49YPA51q+r9jEvEpPziD0UZ2hsrcdCo2T4PPJax_ZEQ@mail.gmail.com>
Comments inline. Caveat: These are just Chrome's opinions and are not binding. Paul, Youenn feel free to jump in :) tl;dr: Lets be consistent across our interfaces if such functionality is needed. - dale On Tue, Jan 6, 2026 at 8:00 AM Francois Daoust <fd@w3.org> wrote: > Hello Media Working Group participants, > > W3C received a liaison letter for the Media Working Group from the > 3GPP TSG SA WG4 (SA4) with questions on the WebCodecs API. See attached > file for the full letter. I'm reproducing the questions in text format > below to ease review. > > [[ > IVAS codec is designed to handle multiple spatial input formats > (Multi-channel / Objects / Ambisonics / Parametric (Metadata Assisted > Spatial Audio, MASA)) and the IVAS decoder integrates a renderer which is > capable of rendering to various output formats like binaural and > loud-speaker layouts. Time aligned metadata serves a critical function in > both the encoder and decoder. In IVAS Encoder, time aligned metadata is > needed for encoding object-based audio, MASA and combined formats. > Similarly, IVAS Decoder can ingest head orientation sensor metadata to > render a fully immersive head-tracked binaural audio experience. The > decoder may also need to manage Processing Information metadata when > available in an RTP payload to configure the rendering. > > AudioDecoder Interface > ----- > - Currently the configure() API defines AudioDecoderConfig to only allow > for the description field, a sequence of codec specific bytes, which > commonly maps to the initial one-time setup of various codecs (e.g. Audio > Specific Config in AAC, header/codebook data in Vorbis, etc). However, it > does not provide a mechanism for any run-time controls for the decoder > (e.g. an output format to configure decoded output). The AudioEncoder > interface allows for codec-specific extensions to AudioEncoderConfig as a > dictionary of configurations, while AudioDecoderConfig doesn’t. > Question: What is the correct way to implement run-time controls like the > output format (e.g., BINAURAL, Stereo (2.0), 5.1.2, 7.1.4) in the decoder? > For set-once type controls, adding codec-specific extensions to the config object seems like the correct approach. Technically you could continue to stuff everything in the description. It's already an opaque codec specific blob, but that's not very usable if clients are expecting to tweak it. > > - Time-varying metadata input for the decoder, either from the device > sensors (e.g. user head orientation) or with out-of-band signalling (e.g. > from RTP / Media Container), may be needed for the proper integration of > IVAS’s immersive features, however currently there is no direct way to > provide this to the decode() API. > Questions: How can time-aligned codec-specific metadata be injected into > the decode() call without adding multiple configure() calls per frame? If > multiple configure() calls are used per frame, would the configure and > decode calls be synchronously processed? > That depends on if the decoder itself is using the metadata or if it's just something that gets mapped to an output. If the decoder needs the metadata, adding something equivalent to the VideoEncoderEncodeOptions passed to encode() but on the decoder seems reasonable. If instead, it's just for passing to outputs the client can already handle that mapping based on timestamps (assuming 1:1 input/output). > > - A decoder might need to produce additional time-aligned codec-specific > metadata (e.g. object metadata) as output when an external rendering is > desired. Currently the AudioDataOutputCallback only allows for an AudioData > interface, but there is no way for the decode() API to produce any > additional metadata output. This contrasts with the > EncodedAudioChunkOutputCallback which implements an optional the > EncodedAudioChunkMetadata field. > Question: How can the AudioDecoder interface provide time-aligned metadata > and PCM audio output? > It would depend on if this time aligned metadata should also be given to a hypothetical encoder. If it is, then adding metadata fields to the encoded chunk would make sense from a symmetry perspective. Otherwise, some form of metadata struct like the encoders use seems reasonable. > > AudioEncoder Interface > ----- > - Unlike the decoder, the encoder configure() API allows for > codec-specific extensions to AudioEncoderConfig, however, for certain > immersive formats like object-based audio or MASA, the encoder requires > additional time varying metadata that is time-aligned with the input audio. > Question: How can such metadata input be injected into the Audio Encoder > in a synchronous manner to ensure time-aligned application within the > encoder? > ]] > Adding an equivalent of VideoEncoderEncodeOptions to AudioEncoder::encode() seems reasonable for this type of thing.
Received on Tuesday, 6 January 2026 23:52:37 UTC