- From: Francois Daoust <fd@w3.org>
- Date: Thu, 15 Jan 2026 13:11:51 +0100
- To: Dale Curtis <dalecurtis@google.com>
- Cc: public-media-wg@w3.org
- Message-ID: <0d498d9b-09cc-4949-8ee8-601049a6df83@w3.org>
On 2026-01-07 00:52, Dale Curtis wrote: > Comments inline. Caveat: These are just Chrome's opinions and are not > binding. Paul, Youenn feel free to jump in :) > > tl;dr: Lets be consistent across our interfaces if such functionality > is needed. The registry entry requirements can currently be interpreted to say that codec-specific extensions to dictionaries are restricted to the encoder config bits (that may have partly triggered their first question?): "Where applicable, a registration specification may include a section describing extensions to VideoEncoderConfig or AudioEncoderConfig dictionaries." https://www.w3.org/TR/webcodecs-codec-registry/#registration-entry-requirements This thread suggests that, in the future, there could perhaps be a need for codec-specific extensions to AudioDecoderConfig or VideoDecoderConfig as well. Also, the requirement seems a bit off as we already have registrations that extend VideoEncoderEncodeOptions (e.g., VP9). To make the requirements more future proof, would it make sense to reformulate that requirement as: "Where applicable, a registration specification may include a section describing extensions to the dictionaries used in the configure(), decode() and encode() methods of the decoder and encoder interfaces (e.g., AudioDecoderConfig, VideoDecoderConfig, AudioEncoderConfig, VideoEncoderConfig, VideoEncoderEncodeOptions)." ... or something like that :) François. > > - dale > > On Tue, Jan 6, 2026 at 8:00 AM Francois Daoust <fd@w3.org> wrote: > > Hello Media Working Group participants, > > W3C received a liaison letter for the Media Working Group from the > 3GPP TSG SA WG4 (SA4) with questions on the WebCodecs API. See > attached file for the full letter. I'm reproducing the questions > in text format below to ease review. > > [[ > IVAS codec is designed to handle multiple spatial input formats > (Multi-channel / Objects / Ambisonics / Parametric (Metadata > Assisted Spatial Audio, MASA)) and the IVAS decoder integrates a > renderer which is capable of rendering to various output formats > like binaural and loud-speaker layouts. Time aligned metadata > serves a critical function in both the encoder and decoder. In > IVAS Encoder, time aligned metadata is needed for encoding > object-based audio, MASA and combined formats. Similarly, IVAS > Decoder can ingest head orientation sensor metadata to render a > fully immersive head-tracked binaural audio experience. The > decoder may also need to manage Processing Information metadata > when available in an RTP payload to configure the rendering. > > AudioDecoder Interface > ----- > - Currently the configure() API defines AudioDecoderConfig to only > allow for the description field, a sequence of codec specific > bytes, which commonly maps to the initial one-time setup of > various codecs (e.g. Audio Specific Config in AAC, header/codebook > data in Vorbis, etc). However, it does not provide a mechanism for > any run-time controls for the decoder (e.g. an output format to > configure decoded output). The AudioEncoder interface allows for > codec-specific extensions to AudioEncoderConfig as a dictionary of > configurations, while AudioDecoderConfig doesn’t. > Question: What is the correct way to implement run-time controls > like the output format (e.g., BINAURAL, Stereo (2.0), 5.1.2, > 7.1.4) in the decoder? > > > For set-once type controls, adding codec-specific extensions to the > config object seems like the correct approach. Technically you could > continue to stuff everything in the description. It's already an > opaque codec specific blob, but that's not very usable if clients are > expecting to tweak it. > > > - Time-varying metadata input for the decoder, either from the > device sensors (e.g. user head orientation) or with out-of-band > signalling (e.g. from RTP / Media Container), may be needed for > the proper integration of IVAS’s immersive features, however > currently there is no direct way to provide this to the decode() API. > Questions: How can time-aligned codec-specific metadata be > injected into the decode() call without adding multiple > configure() calls per frame? If multiple configure() calls are > used per frame, would the configure and decode calls be > synchronously processed? > > > That depends on if the decoder itself is using the metadata or if it's > just something that gets mapped to an output. If the decoder needs the > metadata, adding something equivalent to the VideoEncoderEncodeOptions > passed to encode() but on the decoder seems reasonable. If instead, > it's just for passing to outputs the client can already handle that > mapping based on timestamps (assuming 1:1 input/output). > > > - A decoder might need to produce additional time-aligned > codec-specific metadata (e.g. object metadata) as output when an > external rendering is desired. Currently the > AudioDataOutputCallback only allows for an AudioData interface, > but there is no way for the decode() API to produce any additional > metadata output. This contrasts with the > EncodedAudioChunkOutputCallback which implements an optional the > EncodedAudioChunkMetadata field. > Question: How can the AudioDecoder interface provide time-aligned > metadata and PCM audio output? > > > It would depend on if this time aligned metadata should also be given > to a hypothetical encoder. If it is, then adding metadata fields to > the encoded chunk would make sense from a symmetry perspective. > Otherwise, some form of metadata struct like the encoders use seems > reasonable. > > > AudioEncoder Interface > ----- > - Unlike the decoder, the encoder configure() API allows for > codec-specific extensions to AudioEncoderConfig, however, for > certain immersive formats like object-based audio or MASA, the > encoder requires additional time varying metadata that is > time-aligned with the input audio. > Question: How can such metadata input be injected into the Audio > Encoder in a synchronous manner to ensure time-aligned application > within the encoder? > ]] > > > Adding an equivalent of VideoEncoderEncodeOptions to > AudioEncoder::encode() seems reasonable for this type of thing.
Received on Thursday, 15 January 2026 12:11:54 UTC