- From: Ehsan Akhgari <ehsan.akhgari@gmail.com>
- Date: Mon, 6 May 2013 23:57:47 -0400
- To: Chris Rogers <crogers@google.com>
- Cc: Joseph Berkovitz <joe@noteflight.com>, "public-audio@w3.org WG" <public-audio@w3.org>
- Message-ID: <CANTur_5sNNvpJ2EVmy3x9SBMn9N02TXFfYAt+tHfqXGWEULnOg@mail.gmail.com>
Sorry if the below is a bit garbled, I've spent a bunch of time thinking about OfflineAudioContext and have started to prototype an implementation in Gecko, and the below summarizes my big issues with the current spec. I have a number of smaller issues in my mind as well, but I won't bring them up at this point since I believe once the big issues are addressed, those small ones can be dealt with quite easily. On Fri, May 3, 2013 at 6:42 PM, Chris Rogers <crogers@google.com> wrote: > > > > On Sat, Mar 30, 2013 at 11:36 AM, Joseph Berkovitz <joe@noteflight.com>wrote: > >> Hi all, >> >> I thought I would offer a few thoughts on OfflineAudioContext (OAC) to >> help identify areas that are not fully specified. My approach here is to >> ask: what are the observable aspects of a regular AudioContext (AC)'s state >> with respect to time, and how are those manifested in the offline case? >> >> I've offered some initial suggestions on how to approach these issues to >> stimulate discussion. They are not exact enough to use in the spec and I >> apologize in advance for their lack of clarity. I hope these are helpful in >> advancing our understanding. >> >> ---- >> >> Issue: What is the overall algorithmic description of an OAC's rendering >> process? >> >> Suggestion: Prior to the start of OAC rendering, AudioNodes connected to >> it "do nothing" (that is, they experience no forward motion of performance >> time). Once an OAC begins rendering, the AudioNode graph upstream processes >> audio exactly as if a regular AC's performance time was moving forward >> monotonically, starting from a value of zero. The performance time value >> of zero (with respect to AudioProcessingEvent.playbackTime, source >> start/stop times and AudioParam value-curve times) is mapped to the first >> sample frame in the audio output emitted by the OAC. Upon reaching the >> limit of the supplied length argument in the constructor, the rendering >> process ends and performance time does not move forward any more. >> > > Yes, this is more or less my understanding. > "Do nothing" needs to be clearly spec'ed. Consider the following pseudo-code: var oc = new OfflineAudioContext(...); oc.oncomplete = function ocOnComplete() {}; var sp = oc.createScriptProcessor(); sp.onaudioprocess = function spOnAudioProcess() { oc.startRendering(); sp.onaudioprocess = null; }; sp.connect(oc.destination); Would ocOnComplete ever be called? If yes, that implies that the audio processing needs to start happening at some point, but that can't happen if nodes will not do anything. I'm not convinced that having nodes do nothing is compatible at all with the audio processing defined elsewhere in the spec -- I don't see us mention anywhere that no processing must occur before startRendering has been called. This intertwines with things such AudioBufferSourceNode.start() being called, MediaElementAudioSourceNode being used with an audio element which doesn't start its output immediately because data is being downloaded from the network, perhaps among other things. The question that I would like to raise is that what does startRendering buy us that existing ways of starting playback on regular AudioContexts don't, and why do we need multiple ways of starting rendering of audio samples (either to hardware or to a buffer) depending on the type of the AudioContext? > > >> ---- >> >> Issue: Can an OAC be used to render more than one audio result? >> >> Suggestion: No, it is a one-shot-use object (although it could render and >> deliver a single audio result in discrete chunks). >> > > Agreed > So I take it that this means that once the OfflineAudioContext has length samples of audio (length being the argument passed to its constructor) then an OfflineAudioCompletionEvent is queued and any future audio frames will be discarded? If my understanding is correct, I think this should be explicitly mentioned in the spec. We should also clarify what would happen to the nodes in the graph from that point forwards. Would ScriptProcessorNodes continue to receive audioprocess events results of which would be discarded, or would event dispatch on those nodes stop? What would AnalyserNodes in the graph see after that point? From the implementation's point of view, ideally I would like to be able to disconnect all of the connections in the OfflineAudioContext's graph, and put all of the nodes in a special state which means that any future AudioNode.connect() calls would end up being a no-op (and same for other operations which have audio processing implications, such as AudioBufferSourceNode.start(), etc. But obviously whether or not an implementation can do this kind of optimization depends on how this ends up being specified exactly, because this will have effects that are observable by content. > > >> >> ---- >> >> Issue: A regular AC's currentTime attribute progresses monotonically in >> lock-step with real time. What value does an OAC's currentTime present >> during the asynchronous rendering process? >> >> Suggestion: Upon calling startRendering(), the currentTime value becomes >> zero. >> > > Actually, it should initially be zero even before startRendering() is > called, but will progress forward in time from zero when startRendering() > is called. > I assume that you don't mean that the implementation needs to update the currentTime value before hitting the event loop, is that correct? (I believe this needs to be clarified in the spec for AudioContext too.) We should also clarify the existing prose to mention that currentTIme could potentially be non-realtime for OfflineAudioContext. > > >> During rendering the currentTime attribute of an OAC MAY increase >> monotonically to approximately reflect the progress of the rendering >> process, whose rate may be faster or slower than real time. But whenever >> any rendering-related event is dispatched (e.g. oncomplete or any future >> incremental rendering event), the currentTime value MUST reflect the exact >> duration of all rendered audio up to that point. >> > > Sounds good to me. It's actually a useful feature to be able to read the > .currentTime attribute in this way because a progress UI can be displayed... > This sounds good to me as well. > >> ---- >> >> Issue: It is not clear whether one can modify the node graph feeding an >> OAC. However, synthesis graphs feeding a real-time AC's destination are >> typically constructed in a just-in-time fashion driven by >> window.setInterval(), including only source nodes which are scheduled in a >> reasonably short time window into the future (e.g. 5-10 seconds). Thus, >> graphs feeding a real time AC need never become all that large and the work >> of constructing these graphs can be broken into nice processing slices. >> > I think from the viewpoint of web authors, it is quite unreasonable to disallow modifications to the graph after OfflineAudioContext.startRendering is called, as that breaks one of the fundamental pieces of the processing model as defined elsewhere in the spec. However, with my implementer's hat on, allowing such modifications would create all sorts of racing behavior, effects of which are clearly visible to web content. Consider the following pseudo-code: var oc = new OfflineAudioContext(...); var source = oc.createBufferSource(); source.buffer = ...; source.start(0); oc.startRendering(); source.connect(destination); // Note: called *after* startRendering setTimeout(function() { source.disconnect(oc.destination); }, 100); On a regular AudioContext, this will let the source node play back for about 100ms. If the implementation does support faster than realtime processing for OfflineAudioContext, what would the renderedBuffer contain? It seems like the results of such processing would be different in two runs of this pseudo-code, which will provide an inherently unreliable API which doesn't have deterministic behavior from the viewpoint of the web content. (Which reminds me of another point -- according to the current spec, an implementation which treats OfflineAudioContext as a realtime AudioContext which just doesn't render the input of its destination node to audio hardware is conforming, which would mean that such behavior could potentially vary even more between implementations than multiple runs on a single implementation.) Another way of saying this is that in an OAC there is no way to >> "granulate" the rendering process (at least, as long as we keep the >> approach that a single chunk of data is to be produced at the end). Thus, >> it seems that developers must assemble a single huge graph for the entire >> timespan to be rendered, at once. This is likely to tie up the main thread >> while application JS code constructs this huge graph. >> > > I'm not too worried that the graph construction will take very long, even > for large graphs. > > >> >> Suggestion: Dispatch periodic "rendering partially complete" events from >> an OAC for reasonably sized chunks of data. Typically these would be much >> larger than 128-frame blocks; they would be in a multi-second timeframe. >> During handling of these events (but at no other times), AudioNodes may be >> removed from or added to the OAC's graph. This not only solves the issue >> detailed above, but also handles arbitrarily long audio output streams. >> Note that one cannot easily use a sequence of multiple OACs on successive >> time ranges to simulate this outcome because of effect tail times. >> > > Especially in the case of rendering very long time periods (for example > >10mins) I think it's very interesting to have these "partial render" > events. I'd like to make sure we can have a good way to add such an event, > without necessarily requiring it in a V1 of the spec. > One potential way to handle these would be to dispatch "progress" events to the OfflineAudioContext, I guess... > > >> >> Corollary: The consequences of modifying the graph of AudioNodes feeding >> the OAC during rendering are not defined EXCEPT when these modifications >> take place during these proposed events. >> > > Yes > Given the above testcase, I don't think that would be nearly enough. Note that these kinds of interactions can happen in integration with all other parts of the Web platform which have implicit or explicit notions of time -- window.setTimeout is just one example. > >> ---- >> >> Issue: The spatialization attributes (location, orientation, etc.) >> of AudioListener and PannerNode cannot be scheduled. In a regular AC these >> can be modified in real time during rendering (I think). However, there is >> no way in an OAC to perform the same modifications at various moments in >> offline performance time. >> > >> Suggestion: Introduce events that trigger at given performance time >> offsets in an OAC? Replace these spatialization attributes with >> AudioParams? Simply stipulate that this can't be done? >> > > That's a good point, and this is a limitation even in the normal > AudioContext case if very precise scheduling is desired for the > spatialization attributes. I think we can consider making these be > controllable via AudioParams, but hopefully that's something we can > consider as separate from just getting basic OfflineAudioContext defined. > While I worry about the precise scheduling scenario, I worry about the race condition issues even using AudioParams a lot more. Another big question that I have is how reasonable is it to expect the web author to know the desired length of the OfflineAudioContext's rendered buffer in advance? This is especially worrying if, for example, you want to render the output of an HTMLMediaElement, for example. The way that the spec is currently defined, if the media element doesn't have output immediately available for any reason, then you'll just get a number of silent frames in the beginning of the rendered buffer, and not all of the expected rendered frames. This is an issue will all sorts of potential sources of delay in the graph that are not under the control of the web author. That said, even for graphs which do not have any such sources of delays, it could still be quite tricky to compute the precise buffer length in advance, for example, in presence of loops in the graphs with DelayNodes that change their delays dynamically according to automation events scheduled on the delay AudioParam, etc. Cheers, -- Ehsan <http://ehsanakhgari.org/>
Received on Tuesday, 7 May 2013 03:58:55 UTC