Re: OfflineAudioContext specification gaps from Ehsan Akhgari on 2013-05-07 (public-audio@w3.org from April to June 2013)

From: Ehsan Akhgari <ehsan.akhgari@gmail.com>
Date: Mon, 6 May 2013 23:57:47 -0400
To: Chris Rogers <crogers@google.com>
Cc: Joseph Berkovitz <joe@noteflight.com>, "public-audio@w3.org WG" <public-audio@w3.org>
Message-ID: <CANTur_5sNNvpJ2EVmy3x9SBMn9N02TXFfYAt+tHfqXGWEULnOg@mail.gmail.com>
Sorry if the below is a bit garbled, I've spent a bunch of time thinking
about OfflineAudioContext and have started to prototype an implementation
in Gecko, and the below summarizes my big issues with the current spec.  I
have a number of smaller issues in my mind as well, but I won't bring them
up at this point since I believe once the big issues are addressed, those
small ones can be dealt with quite easily.

On Fri, May 3, 2013 at 6:42 PM, Chris Rogers <crogers@google.com> wrote:

>
>
>
> On Sat, Mar 30, 2013 at 11:36 AM, Joseph Berkovitz <joe@noteflight.com>wrote:
>
>> Hi all,
>>
>> I thought I would offer a few thoughts on OfflineAudioContext (OAC) to
>> help identify areas that are not fully specified. My approach here is to
>> ask: what are the observable aspects of a regular AudioContext (AC)'s state
>> with respect to time, and how are those manifested in the offline case?
>>
>> I've offered some initial suggestions on how to approach these issues to
>> stimulate discussion. They are not exact enough to use in the spec and I
>> apologize in advance for their lack of clarity. I hope these are helpful in
>> advancing our understanding.
>>
>> ----
>>
>> Issue: What is the overall algorithmic description of an OAC's rendering
>> process?
>>
>> Suggestion: Prior to the start of OAC rendering, AudioNodes connected to
>> it "do nothing" (that is, they experience no forward motion of performance
>> time). Once an OAC begins rendering, the AudioNode graph upstream processes
>> audio exactly as if a regular AC's performance time was moving forward
>> monotonically, starting from a value of zero.  The performance time value
>> of zero (with respect to AudioProcessingEvent.playbackTime, source
>> start/stop times and AudioParam value-curve times) is mapped to the first
>> sample frame in the audio output emitted by the OAC. Upon reaching the
>> limit of the supplied length argument in the constructor, the rendering
>> process ends and performance time does not move forward any more.
>>
>
> Yes, this is more or less my understanding.
>

"Do nothing" needs to be clearly spec'ed.  Consider the following
pseudo-code:

var oc = new OfflineAudioContext(...);
oc.oncomplete = function ocOnComplete() {};
var sp = oc.createScriptProcessor();
sp.onaudioprocess = function spOnAudioProcess() {
  oc.startRendering();
  sp.onaudioprocess = null;
};
sp.connect(oc.destination);

Would ocOnComplete ever be called?  If yes, that implies that the audio
processing needs to start happening at some point, but that can't happen if
nodes will not do anything.  I'm not convinced that having nodes do nothing
is compatible at all with the audio processing defined elsewhere in the
spec -- I don't see us mention anywhere that no processing must occur
before startRendering has been called.

This intertwines with things such AudioBufferSourceNode.start() being
called, MediaElementAudioSourceNode being used with an audio element which
doesn't start its output immediately because data is being downloaded from
the network, perhaps among other things.

The question that I would like to raise is that what does startRendering
buy us that existing ways of starting playback on regular AudioContexts
don't, and why do we need multiple ways of starting rendering of audio
samples (either to hardware or to a buffer) depending on the type of the
AudioContext?


>
>
>> ----
>>
>> Issue: Can an OAC be used to render more than one audio result?
>>
>> Suggestion: No, it is a one-shot-use object (although it could render and
>> deliver a single audio result in discrete chunks).
>>
>
> Agreed
>

So I take it that this means that once the OfflineAudioContext has length
samples of audio (length being the argument passed to its constructor) then
an OfflineAudioCompletionEvent is queued and any future audio frames will
be discarded?  If my understanding is correct, I think this should be
explicitly mentioned in the spec.

We should also clarify what would happen to the nodes in the graph from
that point forwards.  Would ScriptProcessorNodes continue to receive
audioprocess events results of which would be discarded, or would event
dispatch on those nodes stop?  What would AnalyserNodes in the graph see
after that point?  From the implementation's point of view, ideally I would
like to be able to disconnect all of the connections in the
OfflineAudioContext's graph, and put all of the nodes in a special state
which means that any future AudioNode.connect() calls would end up being a
no-op (and same for other operations which have audio processing
implications, such as AudioBufferSourceNode.start(), etc.  But obviously
whether or not an implementation can do this kind of optimization depends
on how this ends up being specified exactly, because this will have effects
that are observable by content.


>
>
>>
>> ----
>>
>> Issue: A regular AC's currentTime attribute progresses monotonically in
>> lock-step with real time. What value does an OAC's currentTime present
>> during the asynchronous rendering process?
>>
>> Suggestion: Upon calling startRendering(), the currentTime value becomes
>> zero.
>>
>
> Actually, it should initially be zero even before startRendering() is
> called, but will progress forward in time from zero when startRendering()
> is called.
>

I assume that you don't mean that the implementation needs to update the
currentTime value before hitting the event loop, is that correct?  (I
believe this needs to be clarified in the spec for AudioContext too.)

We should also clarify the existing prose to mention that currentTIme could
potentially be non-realtime for OfflineAudioContext.


>
>
>> During rendering the currentTime attribute of an OAC MAY increase
>> monotonically to approximately reflect the progress of the rendering
>> process, whose rate may be faster or slower than real time. But whenever
>> any rendering-related event is dispatched (e.g. oncomplete or any future
>> incremental rendering event), the currentTime value MUST reflect the exact
>> duration of all rendered audio up to that point.
>>
>
> Sounds good to me.  It's actually a useful feature to be able to read the
> .currentTime attribute in this way because a progress UI can be displayed...
>

This sounds good to me as well.


>
>> ----
>>
>> Issue: It is not clear whether one can modify the node graph feeding an
>> OAC. However, synthesis graphs feeding a real-time AC's destination are
>> typically constructed in a just-in-time fashion driven by
>> window.setInterval(), including only source nodes which are scheduled in a
>> reasonably short time window into the future (e.g. 5-10 seconds). Thus,
>> graphs feeding a real time AC need never become all that large and the work
>> of constructing these graphs can be broken into nice processing slices.
>>
>
I think from the viewpoint of web authors, it is quite unreasonable to
disallow modifications to the graph after
OfflineAudioContext.startRendering is called, as that breaks one of the
fundamental pieces of the processing model as defined elsewhere in the
spec.  However, with my implementer's hat on, allowing such modifications
would create all sorts of racing behavior, effects of which are clearly
visible to web content.  Consider the following pseudo-code:

var oc = new OfflineAudioContext(...);
var source = oc.createBufferSource();
source.buffer = ...;
source.start(0);
oc.startRendering();
source.connect(destination); // Note: called *after* startRendering
setTimeout(function() {
  source.disconnect(oc.destination);
}, 100);

On a regular AudioContext, this will let the source node play back for
about 100ms.  If the implementation does support faster than realtime
processing for OfflineAudioContext, what would the renderedBuffer contain?
It seems like the results of such processing would be different in two runs
of this pseudo-code, which will provide an inherently unreliable API which
doesn't have deterministic behavior from the viewpoint of the web content.

(Which reminds me of another point -- according to the current spec, an
implementation which treats OfflineAudioContext as a realtime AudioContext
which just doesn't render the input of its destination node to audio
hardware is conforming, which would mean that such behavior could
potentially vary even more between implementations than multiple runs on a
single implementation.)

 Another way of saying this is that in an OAC there is no way to
>> "granulate" the rendering process (at least, as long as we keep the
>> approach that a single chunk of data is to be produced at the end). Thus,
>> it seems that developers must assemble a single huge graph for the entire
>> timespan to be rendered, at once. This is likely to tie up the main thread
>> while application JS code constructs this huge graph.
>>
>
> I'm not too worried that the graph construction will take very long, even
> for large graphs.
>
>
>>
>> Suggestion: Dispatch periodic "rendering partially complete" events from
>> an OAC for reasonably sized chunks of data. Typically these would be much
>> larger than 128-frame blocks; they would be in a multi-second timeframe.
>> During handling of these events (but at no other times), AudioNodes may be
>> removed from or added to the OAC's graph. This not only solves the issue
>> detailed above, but also handles arbitrarily long audio output streams.
>>  Note that one cannot easily use a sequence of multiple OACs on successive
>> time ranges to simulate this outcome because of effect tail times.
>>
>
> Especially in the case of rendering very long time periods (for example
> >10mins) I think it's very interesting to have these "partial render"
> events.  I'd like to make sure we can have a good way to add such an event,
> without necessarily requiring it in a V1 of the spec.
>

One potential way to handle these would be to dispatch "progress" events to
the OfflineAudioContext, I guess...


>
>
>>
>> Corollary: The consequences of modifying the graph of AudioNodes feeding
>> the OAC during rendering are not defined EXCEPT when these modifications
>> take place during these proposed events.
>>
>
> Yes
>

Given the above testcase, I don't think that would be nearly enough.  Note
that these kinds of interactions can happen in integration with all other
parts of the Web platform which have implicit or explicit notions of time
-- window.setTimeout is just one example.


>
>> ----
>>
>> Issue: The spatialization attributes (location, orientation, etc.)
>> of AudioListener and PannerNode cannot be scheduled. In a regular AC these
>> can be modified in real time during rendering (I think). However, there is
>> no way in an OAC to perform the same modifications at various moments in
>> offline performance time.
>>
>
>> Suggestion: Introduce events that trigger at given performance time
>> offsets in an OAC? Replace these spatialization attributes with
>> AudioParams? Simply stipulate that this can't be done?
>>
>
> That's a good point, and this is a limitation even in the normal
> AudioContext case if very precise scheduling is desired for the
> spatialization attributes.  I think we can consider making these be
> controllable via AudioParams, but hopefully that's something we can
> consider as separate from just getting basic OfflineAudioContext defined.
>

While I worry about the precise scheduling scenario, I worry about the race
condition issues even using AudioParams a lot more.


Another big question that I have is how reasonable is it to expect the web
author to know the desired length of the OfflineAudioContext's rendered
buffer in advance?  This is especially worrying if, for example, you want
to render the output of an HTMLMediaElement, for example.  The way that the
spec is currently defined, if the media element doesn't have output
immediately available for any reason, then you'll just get a number of
silent frames in the beginning of the rendered buffer, and not all of the
expected rendered frames.  This is an issue will all sorts of potential
sources of delay in the graph that are not under the control of the web
author.  That said, even for graphs which do not have any such sources of
delays, it could still be quite tricky to compute the precise buffer length
in advance, for example, in presence of loops in the graphs with DelayNodes
that change their delays dynamically according to automation events
scheduled on the delay AudioParam, etc.


Cheers,
--
Ehsan
<http://ehsanakhgari.org/>
Received on Tuesday, 7 May 2013 03:58:55 UTC