- From: Chris Rogers <crogers@google.com>
- Date: Fri, 20 Apr 2012 15:26:30 -0700
- To: robert@ocallahan.org
- Cc: public-audio@w3.org
- Message-ID: <CA+EzO0kn7D9NZQDYZ+u1GaMoE6MD4pyf+rrn2Jh0f_VzUa_pPw@mail.gmail.com>
On Thu, Apr 19, 2012 at 4:39 AM, Robert O'Callahan <robert@ocallahan.org>wrote: > On Thu, Apr 19, 2012 at 10:36 AM, Robert O'Callahan <robert@ocallahan.org>wrote: > >> On Wed, Apr 18, 2012 at 12:23 PM, Randell Jesup <randell-ietf@jesup.org>wrote: >> >>> So it sounds like to modify audio in a MediaStream you'll need to: >>> >>> * Extract each track from a MediaStream >>> * Turn each track into a source (might be combined with previous step) >>> * Attach each source to a graph >>> * Extract tracks from the destination of the graphs >>> * Extract the video stream(s) from the MediaStream source >>> * Combine all the tracks back into a new MediaStream >>> >> >> And one of the downsides of doing it this way is that you lose sync >> between the audio and video streams. Usually not by much, but more for >> certain kinds of processing. Given there's a way to not lose sync at all, >> why not use it? Sorry to harp on this :-). >> > > > Offline, Chris and I discussed some ways in which Web Audio could > propagate latency information to solve this problem. It's probably better > for him to explain his ideas since I'm not completely sure how they would > work. > > It gets more complicated if we have the ability to pause media streams > (which I think we will, e.g. for live broadcasts that aren't interactive, > it makes sense to pause instead of just dropping data). Since Web Audio > can't pause, the paused time has to be accounted for and essentially a time > slice of the Web Audio output corresponding to the pause interval has to be > clipped out. And that's going to be annoying if your filter is something > like an echo ... some echo would be lost, I think. > > Maybe all these issues can be solved, or deemed not worth addressing, but > supporting pause seems appealing to me. > Hi Rob, it was good talking with you the other day. It was nice to have a technical discussion. In terms of pausing, we already have this notion with the HTMLMediaElement and MediaController APIs. They both have a pause() method. There's more than one way to think of the concept of "pause". In other words, it's not just one specific behavior. For example, it might be desirable in some cases for a reverb tail to continue playing after someone pauses a particular <audio> element. Other times, that might not be the desired behavior. I talked about this in more detail in this thread: http://lists.w3.org/Archives/Public/public-audio/2012JanMar/0475.html So it's probably worth creating use cases of very specific and distinct pause scenarios with very specific desired behavior. Then we can talk about how we might approach each one. Latency/Synchronization is a very complex topic. I'll try my best to bring out some ideas about this: Some audio processing algorithms incur a latency (delay) such that the processed audio stream can become mis-aligned with other audio streams or video streams. Depending on the particular case, there are different ways of handling this mis-alignment. In some cases it's best to *not* apply *any* kind of compensation, for example when processing a live real-time audio stream (person playing guitar with processing effects) it would not be desirable to add a compensating delay when mixing and sharing effects with other audio sources which have a latency. Very similarly, it may not be desirable to compensate for latency with sound effects triggered according to game play. I think in your example of video synchronization, then a compensation would be much more desirable, especially if the latency were large (as in your example). So, we're faced with: 1. How can the system determine how much latency is caused by a particular processing node, or a chain of processing nodes between a particular source and destination? 2. What strategies are available for latency compensating (thus restoring relative synchronization)? 3. What scenarios face potential latency and sync problems, and in what scenarios do we want to apply a given compensation strategy from (2), or apply no compensation whatsoever? 1. LATENCY DETERMINATION I'll start by explaining how this is usually accomplished in pro-audio applications. Although my example is using Apple's Logic Audio and AudioUnit plugins, this technique is widely used by other pro-audio applications on different platforms using different plugin formats. As a side note, the analogy of AudioUnit plugin is "AudioNode" in the Web Audio API. First of all, an individual processing node (AudioUnit plugin in this case) reports its latency via a property (please search for "kAudioUnitProperty_Latency"): https://developer.apple.com/library/mac/#documentation/MusicAudio/Conceptual/AudioUnitProgrammingGuide/TheAudioUnit/TheAudioUnit.html Then a host (digital audio work-station application) loads this plugin and may query its latency and make use of this information. For example, Logic Audio has a detailed tech page about this and the strategies available for dealing with latency/synchronization: http://help.apple.com/logicpro/mac/9.1.6/en/logicpro/usermanual/index.html#chapter=41%26section=3%26tasks=true So, how does this apply in the Web Audio API? Jer Noble has added some smarts in our implementation to determine the latency for each AudioNode using a virtual method called "latencyTime()": https://bugs.webkit.org/attachment.cgi?id=131428&action=prettypatch One thing to note is that we have not *yet* implemented full support for this using JavaScriptAudioNode. I anticipate it would be useful for a developer implementing custom JS code using a JavaScriptAudioNode to have a way to report latency by setting a .latency attribute. So with your ducking example the developer would set this value to 1 (for 1 second latency). 2. COMPENSATION STRATEGIES So based on this information it's possible to calculate latency from any point to another point so that a strategy (if any) may be chosen to compensate. First I'll consider your ducking example as case (a) a. For synchronizing video, the strategy would be to compensate the video frame presentation by an equivalent delay. This technique can be applied automatically by the implementation since it knows the exact amount of latency (from info about each node's latency). The Logic Audio tech page goes into some detail about two particular strategies for aligning audio streams, and it's worth reading: http://help.apple.com/logicpro/mac/9.1.6/en/logicpro/usermanual/index.html#chapter=41%26section=3%26tasks=true We can summarize these two techniques as (b) and (c) below. b. Scheduling compensation: For sounds having a latency, compensate by scheduling sound events to happen earlier than normal by an equivalent amount of the total latency. Because the sounds are triggered earlier than normal, once they are processed through effects with latency, they will sound at the correct time. This technique is not available to "live" sounds such as playing guitar live, receiving live streams from WebRTC, or playing a MIDI keyboard live. c. Delay compensation: For sounds having less latency than other sounds, insert a delay node with equivalent delay to make up the difference. 3. SCENARIOS a. Audio from a <video> element is processed with latency (as in your ducking example). In this case the strategy of delaying video frame presentation would be used, and could be accomplished automatically by the implementation because it has all necessary information available to apply this strategy. b. Several synthesizer instruments (Drums, Bass, Piano) are playing notes via a sequencer (pre-determined sequence of notes). One or more of the instruments have in-line effects with latency. This is one of the exact cases in the Logic Audio tech page. The compensation strategy is to offset scheduled times to play earlier than normal, thus notes which will play with synthesizers having 30ms delay can be scheduled exactly 30ms early. c. Consider the (b) case above, but additionally the user is playing a MIDI keyboard along with the sequenced music, triggering synthesizer notes (Space Synth) using effects which have no latency. In this case, it would be highly undesirable to use *any* latency compensation for "Space Synth" because the user wishes to hear the notes played on the MIDI keyboard immediately with no delay and will be playing in-time with the other synthesizers which are already relatively synchronized with each other. d. A music track is played and processed by an effect which has latency. But we wish to hear the effected "wet" sound mixed with the original "dry" signal to achieve an appropriate amount of dry/wet blend. The strategy used is to insert a delay on the "dry/unprocessed/original" signal having equivalent latency to the effect, then they may be mixed and blended in synchronization. Chris In the Web Audio API implementation in WebKit each node (internally) reports its own latency. Jer Noble added this ability a little while ago: > > Rob > -- > “You have heard that it was said, ‘Love your neighbor and hate your > enemy.’ But I tell you, love your enemies and pray for those who persecute > you, that you may be children of your Father in heaven. ... If you love > those who love you, what reward will you get? Are not even the tax > collectors doing that? And if you greet only your own people, what are you > doing more than others?" [Matthew 5:43-47] > >
Received on Friday, 20 April 2012 22:26:59 UTC