Re: Requirements for Web audio APIs

On Thu, Apr 14, 2011 at 10:11 AM, Chris Rogers <> wrote:

> On Wed, Apr 13, 2011 at 10:44 AM, Robert O'Callahan <>wrote:
>> Some of these requirements have arisen very recently.
>> 1) Integrate with media capture and peer-to-peer streaming APIs
>> There's a lot of energy right now around APIs and protocols for real-time
>> communication in Web browsers, in particular proposed WHATWG APIs for media
>> capture and peer-to-peer streaming:
>> Ian Hickson's proposed API creates a "Stream" abstraction representing a
>> stream of audio and video data. Many use-cases require integration of media
>> capture and/or peer-to-peer streaming with audio effects processing.
> To a small extent, I've been involved with some of the Google engineers
> working on this.  I would like to make sure the API is coherent with an
> overall web audio architecture.  I believe it should be possible to design
> the API in such a way that it's scalable to work with my graph-based
> proposal (AudioContext and AudioNodes).

Have you made any progress on that?

My concern is that having multiple abstractions representing streams of
media data --- AudioNodes and Streams --- would be redundant.

>> 2) Need to handle streams containing synchronized audio and video
>> Many use-cases require effects to be applied to an audio stream which is
>> then played back alongside a video track with synchronization. This can
>> require the video to be delayed, so we need a framework that handles both
>> audio and video. Also, the WHATWG Stream abstraction contains video as well
>> as audio, so integrating with it will mean pulling in video.
> I assume you mean dealing with latency compensation here?  In other words,
> some audio processing may create a delay which needs to be compensated for
> by an equivalent delay in presenting the video stream.


> This is a topic which came up in my early discussions with Apple, as they
> were also interested in this.  We talked about having a .latency attribute
> on every processing node (AudioNode) in the rendering graph.  That way the
> graph can be queried and the appropriate delay can be factored into the
> video presentation.  A .latency attribute is also useful for synchronizing
> two audio streams, each of which may have different latency characteristics.
>  In modern digital audio workstation software, this kind of compensation is
> very important.

You seem to be suggesting exposing latency information to Web apps, which
then adjust the video presentation somehow ... but how? HTML media elements
have no API that allows the author to introduce extra buffering of video
output. Even if there was such an API, it would be clumsy to use for this
purpose and I'm pretty sure the quality of A/V sync would be reduced. Media
engines currently work hard to make sure that video frames are presented at
the right moment, based on the audio hardware clock, and a fixed latency
parameter would interfere with that.

I would like to see an API that integrates video and audio into a single
processing architecture so that we can get high-quality A/V sync with audio
processing, and authors don't have to manage latency explicitly.

Another use-case to think about is the Xbox 360 chat "voice distortion"
feature: the user's voice is captured via a microphone, and a distortion
effect is applied before it's sent over the network. Perhaps video is also
being captured and we want to send it in sync with that processed audio.
Having authors manually manage latency in that scenario sounds very

 3) Need to handle synchronization of streams from multiple sources
>> There's ongoing work to define APIs for playing multiple media resources
>> with synchronization, including a WHATWG proposal:
>> Many use-cases require audio effects to be applied to some of those
>> streams while maintaining synchronization.
> I admit that I haven't been closely following this particular proposal.
>  But, I'll try to present my understanding of the problem as it relates to
> <audio> and <video> right now.  Both the play() and pause() methods of
> HTMLMediaElement don't allow a way to specify a time when the event should
> occur.  Ideally, the web platform would have a high-resolution clock,
> similar to the Date class, with its getTime() method, but higher-resolution.
>  This clock can be used as a universal reference time.  Then, for example,
> the play() method could be extended to something like play(time), where
> |time| is based on this clock.  That way, multiple <audio> and <video>
> elements could be synchronized precisely.

That sounds good, but I was thinking of other sorts of problems. Consider
for example the use-case of a <video> movie with a regular audio track, and
an auxiliary <audio> element referencing a commentary track, where we apply
an audio ducking effect to overlay the commentary over the regular audio.
How would you combine audio from both streams and keep everything in sync
(including the video), especially in the face of issues such as one of the
streams temporarily pausing to buffer due to a network glitch?

Rob, thanks for thinking about these ideas.  I've only had a short look at
> your proposal, but will get back when I've had more time to read through it.
>  From my own perspective, I hope we can also spend a bit of time looking
> carefully at my current API proposal to see how it might be extended, as
> necessary, to address the use cases you've brought up.

Yep. Thanks!

"Now the Bereans were of more noble character than the Thessalonians, for
they received the message with great eagerness and examined the Scriptures
every day to see if what Paul said was true." [Acts 17:11]

Received on Thursday, 19 May 2011 09:58:30 UTC