Re: [mediacapture-transform] Is MediaStreamTrackProcessor for audio necessary? (#29)

For MSTG, I made https://guidou.github.io/mstg-audio/mstg-single/ and https://guidou.github.io/mstg-audio/mstg-dual/.
There you can see that using the shim (in Chrome, Safari and Firefox) for sender-side processing results in glitches and permanent high latency every time there is a processing delay. With MSTG there are no glitches and the high latency is temporary (only during slow-processing events).

These demos show that the shim is not good for sender-side audio processing where processing time is real-time on average but has a high variance.
The reason is simple: AudioWorklet sends data on a strict real-time schedule while MediaStreamtTrackGenerator sends data on demand:
* When processing is slow, the shim produces glitches because it runs out of data but AudioWorklet needs to send something on its schedule.
* Once the delayed and new data arrives, permanent delay is introduced because AudioWorklet cannot catch up by sending data faster than at its strict rate. 

MediaStreamTrack generator doesn't have this problem because it can send data as quickly as it's produced so it can catch up when processing speed resumes. This variance is handled by the jitter buffer on the receiver side, which is necessary anyway to handle the network jitter.

To avoid glitches with AudioWorklet, there are two choices:
* Have a  buffer large enough to handle even the slowest processing, but this results in permanent high latency.
* Implement a more sophisticated buffer that can accelerate audio via advanced algorithms when processing is fast. Basically implementing the receiver-side platform jitter buffer (e.g., [NetEq](https://chromium.googlesource.com/external/webrtc/+/master/modules/audio_coding/neteq/g3doc/index.md)), but on the server side. The result is still more latency and complexity. 

AudioWorklet introduces a tradeoff between glitches (due to slow processing) and delay (due to buffering) which is difficult to solve at the application level. A strategy that sounds reasonable in principle like a dynamic queue (as in the polyfill) results in both glitches and high delay.

The tradeoff in MediaStreamTrackGenerator is between glitches (due to slow processing or non-realtime scheduling) and the max queue size parameter. This is easily solved by having a large max queue size, which is usually inexpensive since audio data is small in size. The max queue size does not introduce permanent delay in the same way as a similar front buffer for AudioWorklet because MSTG can clean up its queue faster than real-time when processing is fast.


-- 
GitHub Notification of comment by guidou
Please view or discuss this issue at https://github.com/w3c/mediacapture-transform/issues/29#issuecomment-3627229819 using your GitHub account


-- 
Sent via github-notify-ml as configured in https://github.com/w3c/github-notify-ml-config

Received on Monday, 8 December 2025 14:35:07 UTC