Re: Web Audio API spec review

On Wed, 16 May 2012 21:41:19 +0200, Chris Rogers <crogers@google.com>  
wrote:

> On Wed, May 16, 2012 at 7:55 AM, Philip Jägenstedt  
> <philipj@opera.com>wrote:
>
>> On Tue, 15 May 2012 19:59:07 +0200, Chris Rogers <crogers@google.com>
>> wrote:
>>
>>  On Tue, May 15, 2012 at 4:45 AM, Philip Jägenstedt <philipj@opera.com
>>> >wrote:
>>>
>>
>>  There are a few aspects that make the Web Audio API fit poorly with the
>>>> rest of the Web platform. For example, the integration with
>>>> HTMLMediaElement is one-way; the audio stream of a <video> can be  
>>>> passed
>>>> into AudioContext but the result cannot leave AudioContext or play in
>>>> sync
>>>> with the video channel. That an AudioContext cannot be paused means  
>>>> that
>>>> certain filtering effects on any stallable input (<audio>,  
>>>> MediaStream)
>>>> cannot be implemented, echo or reverb being the most obvious examples.
>>>>
>>>>
>>> I don't believe there are any fundamental serious limitations here.   
>>> For
>>> example, today it's possible to pause an <audio> element and have the
>>> reverb tail continue to play, to fade-out slowly/quickly, or stop right
>>> away.  We can discuss in more detail if you have some very specific use
>>> cases.
>>>
>>
>> The missing option is to simply play the echo when the audio element
>> continues playing, as would be the case for a pre-mixed audio track with
>> echo in it.
>>
>
> I don't understand exactly what you mean here.  It would certainly be
> possible to continue processing with a reverb effect if the audio element
> resumed playing from a paused state.

Yes, but it would not sound the same, since any data from before the pause  
will have "flushed", so any effect with a delay involved will take some  
time to "ramp up" again.

>> Let's take the case of audio descriptions. A WebVTT files contains timed
>> text to be synthesized at a particular point in time and mixed with the
>> main audio track. Assume that the speech audio buffer is available, it  
>> has
>> been pre-generated either on a server or using a JavaScript speech synth
>> engine. That audio buffer must be mixed at a particular time and  
>> slightly
>> before and after that the main audio must be "ducked", i.e. the volume
>> should be ramped down and eventually back up again.
>
>
>> AFAICT the timestamps from the media resource are lost as soon as the
>> audio enters the Web Audio API, so the only way to know where to apply  
>> the
>> ramp is by polling video.currentTime. If the media element pauses or  
>> stalls
>> you have to take great care to reschedule those ramps once it starts
>> playing again. (Failure to realize this edge case will result in a poor
>> experience when pausing and unpausing.)
>>
>> Finally, if the audio audio processing pipeline adds any delay to the
>> signal, there's no way to get it back in sync with the video.
>>
>
> We've already had extensive and very detailed discussions about latency  
> and
> synchronization, including issue 56 below.  Timed text events could
> certainly be used to apply appropriate audio processing:
> http://lists.w3.org/Archives/Public/public-audio/2012AprJun/0066.html
> http://lists.w3.org/Archives/Public/public-audio/2012AprJun/0084.html
> http://lists.w3.org/Archives/Public/public-audio/2012JanMar/0475.html

I have read these messages and as far as I can tell it is as Robert says,  
that data will be lost. It seems worthwhile, to me, to have the ability to  
run a filter graph against the media resource timetime, which is pausable.

I'm not sure what kinds of solutions have been discussed before, but if an  
offline AudioContext runs against another (non-realtime) clock then  
perhaps contexts that run against the HTMLMediaElement clock are also  
feasible?

-- 
Philip Jägenstedt
Core Developer
Opera Software

Received on Monday, 21 May 2012 15:15:28 UTC