- From: Dale Curtis <dalecurtis@google.com>
- Date: Wed, 21 Jan 2026 16:28:12 -0800
- To: Francois Daoust <fd@w3.org>
- Cc: public-media-wg@w3.org
- Message-ID: <CAPUDrwckyxKvEmindRob1JMM+bHCy=Z8nZno3h6BfxVPKATwpw@mail.gmail.com>
On Thu, Jan 15, 2026 at 4:11 AM Francois Daoust <fd@w3.org> wrote: > > > On 2026-01-07 00:52, Dale Curtis wrote: > > Comments inline. Caveat: These are just Chrome's opinions and are not > binding. Paul, Youenn feel free to jump in :) > > tl;dr: Lets be consistent across our interfaces if such functionality is > needed. > > > The registry entry requirements can currently be interpreted to say that > codec-specific extensions to dictionaries are restricted to the encoder > config bits (that may have partly triggered their first question?): > > "Where applicable, a registration specification may include a section > describing extensions to VideoEncoderConfig or AudioEncoderConfig > dictionaries." > > https://www.w3.org/TR/webcodecs-codec-registry/#registration-entry-requirements > > This thread suggests that, in the future, there could perhaps be a need > for codec-specific extensions to AudioDecoderConfig or VideoDecoderConfig > as well. Also, the requirement seems a bit off as we already have > registrations that extend VideoEncoderEncodeOptions (e.g., VP9). > > To make the requirements more future proof, would it make sense to > reformulate that requirement as: > > "Where applicable, a registration specification may include a section > describing extensions to the dictionaries used in the configure(), decode() > and encode() methods of the decoder and encoder interfaces (e.g., > AudioDecoderConfig, VideoDecoderConfig, AudioEncoderConfig, > VideoEncoderConfig, VideoEncoderEncodeOptions)." > > ... or something like that :) > Yes, something to this effect makes sense to me. I'm not sure how much value there is in pre-emptively future proofing though. YAGNI and all that :) > > François. > > > - dale > > On Tue, Jan 6, 2026 at 8:00 AM Francois Daoust <fd@w3.org> wrote: > >> Hello Media Working Group participants, >> >> W3C received a liaison letter for the Media Working Group from the >> 3GPP TSG SA WG4 (SA4) with questions on the WebCodecs API. See attached >> file for the full letter. I'm reproducing the questions in text format >> below to ease review. >> >> [[ >> IVAS codec is designed to handle multiple spatial input formats >> (Multi-channel / Objects / Ambisonics / Parametric (Metadata Assisted >> Spatial Audio, MASA)) and the IVAS decoder integrates a renderer which is >> capable of rendering to various output formats like binaural and >> loud-speaker layouts. Time aligned metadata serves a critical function in >> both the encoder and decoder. In IVAS Encoder, time aligned metadata is >> needed for encoding object-based audio, MASA and combined formats. >> Similarly, IVAS Decoder can ingest head orientation sensor metadata to >> render a fully immersive head-tracked binaural audio experience. The >> decoder may also need to manage Processing Information metadata when >> available in an RTP payload to configure the rendering. >> >> AudioDecoder Interface >> ----- >> - Currently the configure() API defines AudioDecoderConfig to only allow >> for the description field, a sequence of codec specific bytes, which >> commonly maps to the initial one-time setup of various codecs (e.g. Audio >> Specific Config in AAC, header/codebook data in Vorbis, etc). However, it >> does not provide a mechanism for any run-time controls for the decoder >> (e.g. an output format to configure decoded output). The AudioEncoder >> interface allows for codec-specific extensions to AudioEncoderConfig as a >> dictionary of configurations, while AudioDecoderConfig doesn’t. >> Question: What is the correct way to implement run-time controls like the >> output format (e.g., BINAURAL, Stereo (2.0), 5.1.2, 7.1.4) in the decoder? >> > > For set-once type controls, adding codec-specific extensions to the config > object seems like the correct approach. Technically you could continue to > stuff everything in the description. It's already an opaque codec specific > blob, but that's not very usable if clients are expecting to tweak it. > > >> >> - Time-varying metadata input for the decoder, either from the device >> sensors (e.g. user head orientation) or with out-of-band signalling (e.g. >> from RTP / Media Container), may be needed for the proper integration of >> IVAS’s immersive features, however currently there is no direct way to >> provide this to the decode() API. >> Questions: How can time-aligned codec-specific metadata be injected into >> the decode() call without adding multiple configure() calls per frame? If >> multiple configure() calls are used per frame, would the configure and >> decode calls be synchronously processed? >> > > That depends on if the decoder itself is using the metadata or if it's > just something that gets mapped to an output. If the decoder needs the > metadata, adding something equivalent to the VideoEncoderEncodeOptions > passed to encode() but on the decoder seems reasonable. If instead, it's > just for passing to outputs the client can already handle that mapping > based on timestamps (assuming 1:1 input/output). > > >> >> - A decoder might need to produce additional time-aligned codec-specific >> metadata (e.g. object metadata) as output when an external rendering is >> desired. Currently the AudioDataOutputCallback only allows for an AudioData >> interface, but there is no way for the decode() API to produce any >> additional metadata output. This contrasts with the >> EncodedAudioChunkOutputCallback which implements an optional the >> EncodedAudioChunkMetadata field. >> Question: How can the AudioDecoder interface provide time-aligned >> metadata and PCM audio output? >> > > It would depend on if this time aligned metadata should also be given to a > hypothetical encoder. If it is, then adding metadata fields to the encoded > chunk would make sense from a symmetry perspective. Otherwise, some form of > metadata struct like the encoders use seems reasonable. > > >> >> AudioEncoder Interface >> ----- >> - Unlike the decoder, the encoder configure() API allows for >> codec-specific extensions to AudioEncoderConfig, however, for certain >> immersive formats like object-based audio or MASA, the encoder requires >> additional time varying metadata that is time-aligned with the input audio. >> Question: How can such metadata input be injected into the Audio Encoder >> in a synchronous manner to ensure time-aligned application within the >> encoder? >> ]] >> > > Adding an equivalent of VideoEncoderEncodeOptions to > AudioEncoder::encode() seems reasonable for this type of thing. > > >
Received on Thursday, 22 January 2026 00:28:29 UTC