Re: Requirements for Web audio APIs from Chris Rogers on 2011-04-13 (public-audio@w3.org from April to June 2011)

From: Chris Rogers <crogers@google.com>
Date: Wed, 13 Apr 2011 15:11:23 -0700
To: robert@ocallahan.org
Cc: public-audio@w3.org
Message-ID: <BANLkTi=PQDXEFy3rO+bKWm6NXaMcj28Okg@mail.gmail.com>
Hi Robert,

Thanks for bringing up these points.  I've also been thinking about some of
these issues recently and agree that we should start some discussions,
especially considering these new recent developments.

On Wed, Apr 13, 2011 at 10:44 AM, Robert O'Callahan <robert@ocallahan.org>wrote:

> There are a few important requirements for a Web audio API that aren't
> satisfied by the current Mozilla and Chrome proposals.


This is just a minor nit, but I'd prefer not to call the audio API proposal
I've been working on 'Chrome'.  The specification was developed with
significant input from Apple WebKit engineers, and the implementation lives
in WebKit.  Also, the proposal has received a bit of review over the past
year by people contributing their ideas on the audio incubator mailing list.


> Some of these requirements have arisen very recently.
>
> 1) Integrate with media capture and peer-to-peer streaming APIs
> There's a lot of energy right now around APIs and protocols for real-time
> communication in Web browsers, in particular proposed WHATWG APIs for media
> capture and peer-to-peer streaming:
> http://www.whatwg.org/specs/web-apps/current-work/complete/video-conferencing-and-peer-to-peer-communication.html
> Ian Hickson's proposed API creates a "Stream" abstraction representing a
> stream of audio and video data. Many use-cases require integration of media
> capture and/or peer-to-peer streaming with audio effects processing.
>

To a small extent, I've been involved with some of the Google engineers
working on this.  I would like to make sure the API is coherent with an
overall web audio architecture.  I believe it should be possible to design
the API in such a way that it's scalable to work with my graph-based
proposal (AudioContext and AudioNodes).



>
> 2) Need to handle streams containing synchronized audio and video
> Many use-cases require effects to be applied to an audio stream which is
> then played back alongside a video track with synchronization. This can
> require the video to be delayed, so we need a framework that handles both
> audio and video. Also, the WHATWG Stream abstraction contains video as well
> as audio, so integrating with it will mean pulling in video.
>

I assume you mean dealing with latency compensation here?  In other words,
some audio processing may create a delay which needs to be compensated for
by an equivalent delay in presenting the video stream.  This is a topic
which came up in my early discussions with Apple, as they were also
interested in this.  We talked about having a .latency attribute on every
processing node (AudioNode) in the rendering graph.  That way the graph can
be queried and the appropriate delay can be factored into the video
presentation.  A .latency attribute is also useful for synchronizing two
audio streams, each of which may have different latency characteristics.  In
modern digital audio workstation software, this kind of compensation is very
important.


>
> 3) Need to handle synchronization of streams from multiple sources
> There's ongoing work to define APIs for playing multiple media resources
> with synchronization, including a WHATWG proposal:
> http://www.whatwg.org/specs/web-apps/current-work/multipage/video.html#mediacontroller
> Many use-cases require audio effects to be applied to some of those streams
> while maintaining synchronization.
>

I admit that I haven't been closely following this particular proposal.
 But, I'll try to present my understanding of the problem as it relates to
<audio> and <video> right now.  Both the play() and pause() methods of
HTMLMediaElement don't allow a way to specify a time when the event should
occur.  Ideally, the web platform would have a high-resolution clock,
similar to the Date class, with its getTime() method, but higher-resolution.
 This clock can be used as a universal reference time.  Then, for example,
the play() method could be extended to something like play(time), where
|time| is based on this clock.  That way, multiple <audio> and <video>
elements could be synchronized precisely. I have somewhat of a similar
concept in my Web Audio API proposal where in-memory (non-streamed) audio
resources can be scheduled to playback at precise times, based on the
AudioContext .currentTime attribute.  I would like to go so far as to
suggest that events, such as mouse-down, key-down events also have
time-stamps based on this high-resolution clock.  This would especially be
important for MIDI events, if we ever get so far as to spec that out.  On
Mac OS X, the CoreMIDI API timestamps all incoming MIDI events which shares
a common clock with the CoreAudio system for precise synchronization.  I
believe that Mac OS X CoreVideo also deals in these timestamps for
presentation time.  I'm only mentioning these Mac OS X APIs as examples of
media architecture where consideration has been put into precise
synchronization of different media events.



>
> 4) Worker-based Javascript audio processing
> Authors will always need custom audio effects or synthesis not supported
> directly in an audio spec. We need a way to produce such effects
> conveniently in Javascript with best possible performance, especially
> latency. Processing audio in Web Workers would insulate the effects code
> from latency caused by tasks on the HTML event loop. Workers have logically
> separate heaps so garbage-collection latency can also be minimized.
>

I agree that we need to support JavaScript-based audio processing, and have
the JavaScriptAudioNode in my proposal.  In theory, this AudioNode could
actually dispatch to an event listener running in a Web Worker.  Admittedly,
my knowledge of the current specification for web-workers is limited, and I
know there are a number of restrictions on sharing objects between the main
thread and the worker thread.  But, it might be possible to share
Float32Arrays between the threads, and get the API working correctly...

I think there will likely be some usability issues because most of the DOM
and other JS objects will be inaccessible from within the worker thread.
 So, I'm guessing it might be a bit clunky to pass relevant information (for
example, current game state information) between the drawing/physics code
and the worker thread.  But, it's certainly an idea worth checking out.


>
> I have put a sketch of an API proposal here that attempts to address those
> requirements:
> https://wiki.mozilla.org/MediaStreamAPI
> It's only a week old and I don't think it's ready to formally propose to a
> Working Group. I feel it needs at least a prototype implementation, both to
> flesh out the parts that are unclear and to ensure that it's actually
> implementable. I plan to do that ASAP. However, given the interest in this
> area I want to let people know what we are thinking about, so that there are
> no surprises later.
>
> BTW, constructive feedback welcome, but I'm more interested in getting the
> concepts right than picking over details.
>
> Thanks,
> Rob
> --
> "Now the Bereans were of more noble character than the Thessalonians, for
> they received the message with great eagerness and examined the Scriptures
> every day to see if what Paul said was true." [Acts 17:11]
>

Rob, thanks for thinking about these ideas.  I've only had a short look at
your proposal, but will get back when I've had more time to read through it.
 From my own perspective, I hope we can also spend a bit of time looking
carefully at my current API proposal to see how it might be extended, as
necessary, to address the use cases you've brought up.

Cheers,
Chris
Received on Wednesday, 13 April 2011 22:11:49 UTC