- From: Chris Rogers <crogers@google.com>
- Date: Wed, 13 Apr 2011 15:11:23 -0700
- To: robert@ocallahan.org
- Cc: public-audio@w3.org
- Message-ID: <BANLkTi=PQDXEFy3rO+bKWm6NXaMcj28Okg@mail.gmail.com>
Hi Robert, Thanks for bringing up these points. I've also been thinking about some of these issues recently and agree that we should start some discussions, especially considering these new recent developments. On Wed, Apr 13, 2011 at 10:44 AM, Robert O'Callahan <robert@ocallahan.org>wrote: > There are a few important requirements for a Web audio API that aren't > satisfied by the current Mozilla and Chrome proposals. This is just a minor nit, but I'd prefer not to call the audio API proposal I've been working on 'Chrome'. The specification was developed with significant input from Apple WebKit engineers, and the implementation lives in WebKit. Also, the proposal has received a bit of review over the past year by people contributing their ideas on the audio incubator mailing list. > Some of these requirements have arisen very recently. > > 1) Integrate with media capture and peer-to-peer streaming APIs > There's a lot of energy right now around APIs and protocols for real-time > communication in Web browsers, in particular proposed WHATWG APIs for media > capture and peer-to-peer streaming: > http://www.whatwg.org/specs/web-apps/current-work/complete/video-conferencing-and-peer-to-peer-communication.html > Ian Hickson's proposed API creates a "Stream" abstraction representing a > stream of audio and video data. Many use-cases require integration of media > capture and/or peer-to-peer streaming with audio effects processing. > To a small extent, I've been involved with some of the Google engineers working on this. I would like to make sure the API is coherent with an overall web audio architecture. I believe it should be possible to design the API in such a way that it's scalable to work with my graph-based proposal (AudioContext and AudioNodes). > > 2) Need to handle streams containing synchronized audio and video > Many use-cases require effects to be applied to an audio stream which is > then played back alongside a video track with synchronization. This can > require the video to be delayed, so we need a framework that handles both > audio and video. Also, the WHATWG Stream abstraction contains video as well > as audio, so integrating with it will mean pulling in video. > I assume you mean dealing with latency compensation here? In other words, some audio processing may create a delay which needs to be compensated for by an equivalent delay in presenting the video stream. This is a topic which came up in my early discussions with Apple, as they were also interested in this. We talked about having a .latency attribute on every processing node (AudioNode) in the rendering graph. That way the graph can be queried and the appropriate delay can be factored into the video presentation. A .latency attribute is also useful for synchronizing two audio streams, each of which may have different latency characteristics. In modern digital audio workstation software, this kind of compensation is very important. > > 3) Need to handle synchronization of streams from multiple sources > There's ongoing work to define APIs for playing multiple media resources > with synchronization, including a WHATWG proposal: > http://www.whatwg.org/specs/web-apps/current-work/multipage/video.html#mediacontroller > Many use-cases require audio effects to be applied to some of those streams > while maintaining synchronization. > I admit that I haven't been closely following this particular proposal. But, I'll try to present my understanding of the problem as it relates to <audio> and <video> right now. Both the play() and pause() methods of HTMLMediaElement don't allow a way to specify a time when the event should occur. Ideally, the web platform would have a high-resolution clock, similar to the Date class, with its getTime() method, but higher-resolution. This clock can be used as a universal reference time. Then, for example, the play() method could be extended to something like play(time), where |time| is based on this clock. That way, multiple <audio> and <video> elements could be synchronized precisely. I have somewhat of a similar concept in my Web Audio API proposal where in-memory (non-streamed) audio resources can be scheduled to playback at precise times, based on the AudioContext .currentTime attribute. I would like to go so far as to suggest that events, such as mouse-down, key-down events also have time-stamps based on this high-resolution clock. This would especially be important for MIDI events, if we ever get so far as to spec that out. On Mac OS X, the CoreMIDI API timestamps all incoming MIDI events which shares a common clock with the CoreAudio system for precise synchronization. I believe that Mac OS X CoreVideo also deals in these timestamps for presentation time. I'm only mentioning these Mac OS X APIs as examples of media architecture where consideration has been put into precise synchronization of different media events. > > 4) Worker-based Javascript audio processing > Authors will always need custom audio effects or synthesis not supported > directly in an audio spec. We need a way to produce such effects > conveniently in Javascript with best possible performance, especially > latency. Processing audio in Web Workers would insulate the effects code > from latency caused by tasks on the HTML event loop. Workers have logically > separate heaps so garbage-collection latency can also be minimized. > I agree that we need to support JavaScript-based audio processing, and have the JavaScriptAudioNode in my proposal. In theory, this AudioNode could actually dispatch to an event listener running in a Web Worker. Admittedly, my knowledge of the current specification for web-workers is limited, and I know there are a number of restrictions on sharing objects between the main thread and the worker thread. But, it might be possible to share Float32Arrays between the threads, and get the API working correctly... I think there will likely be some usability issues because most of the DOM and other JS objects will be inaccessible from within the worker thread. So, I'm guessing it might be a bit clunky to pass relevant information (for example, current game state information) between the drawing/physics code and the worker thread. But, it's certainly an idea worth checking out. > > I have put a sketch of an API proposal here that attempts to address those > requirements: > https://wiki.mozilla.org/MediaStreamAPI > It's only a week old and I don't think it's ready to formally propose to a > Working Group. I feel it needs at least a prototype implementation, both to > flesh out the parts that are unclear and to ensure that it's actually > implementable. I plan to do that ASAP. However, given the interest in this > area I want to let people know what we are thinking about, so that there are > no surprises later. > > BTW, constructive feedback welcome, but I'm more interested in getting the > concepts right than picking over details. > > Thanks, > Rob > -- > "Now the Bereans were of more noble character than the Thessalonians, for > they received the message with great eagerness and examined the Scriptures > every day to see if what Paul said was true." [Acts 17:11] > Rob, thanks for thinking about these ideas. I've only had a short look at your proposal, but will get back when I've had more time to read through it. From my own perspective, I hope we can also spend a bit of time looking carefully at my current API proposal to see how it might be extended, as necessary, to address the use cases you've brought up. Cheers, Chris
Received on Wednesday, 13 April 2011 22:11:49 UTC