Re: Web Audio API Proposal from Ricard Marxer Piñón on 2010-07-14 (public-xg-audio@w3.org from July 2010)

From: Ricard Marxer Piñón <ricardmp@gmail.com>
Date: Wed, 14 Jul 2010 12:06:46 +0200
To: Chris Rogers <crogers@google.com>
Cc: Corban Brook <corbanbrook@gmail.com>, public-xg-audio@w3.org
Message-ID: <AANLkTinmnaI0tkcUdW6QT0IEwTJH1_-VIRaZiM8khO4-@mail.gmail.com>
On Wed, Jul 14, 2010 at 12:33 AM, Chris Rogers <crogers@google.com> wrote:
> Hi Ricard - good questions!  I'll try my best to answer:
>
> On Tue, Jul 13, 2010 at 2:10 PM, Ricard Marxer Piñón <ricardmp@gmail.com>
> wrote:
>>
>> Hi,
>>
>> Yes, making that JavaScriptProcessor node makes sense.
>>
>> I still have a few questions left.
>>
>> Question 1
>> -----
>> If you connect the audioSource of an <audio> element does that
>> audioSource disconnect itself from the default AudioDestination?
>> From my point of view there are two clear use cases to use the
>> audioSource from an <audio> element.
>> 1) to filter or apply some effect to that audio and directly output it
>> and therefore muting the original audio
>> 2) to analyze it and create some visualization therefore we still want
>> to play the original audio
>
> I was thinking that the implicit connection to the "default"
> AudioDestination would be broken as soon as it's connected into a "true"
> processing graph.  This way we could avoid having to explicitly disconnect
> it as you suggest.  Then, I think both cases (1) and (2) can be handled
> identically.  For example:
> In case (1) the JavaScriptProcessor applies some effect and the output is
> connected to context.destination.  We hear the processed output.
> In case (2) the JavaScriptProcessor does an FFT analysis to display
> graphically, but also copies the input samples to the output samples, acting
> as a "pass-through" processor as far as the audio stream is concerned.  We
> then hear the original audio, but are now showing some cool graphics.  This
> is how the native RealtimeAnalyserNode works.

Ok, this makes sense.  I can't currently think of any use cases for
which this is a problem.  If I run into some, I will let you know.
I do think it should be made quite explicit that nodes like the
RealtimeAnalyserNode or the JavaScriptProcessor (that would do the
FFT) are or should be made pass through.  Because my first guess (just
intuitively without looking at the docs) was that they were sinks.  In
the code I wrote on top, I made a copy&paste mistake.  If I would have
written it from scratch I would have done:

function setupAnalysis() {
   myAudioAnalysis = context.createJavaScriptProcessor();
   myAudioAnalysis.onprocess = process;
   var audio = document.getElementById('audioElement');
   audio.audioSource.connect(myAudioAnalysis);
}

Which would have then not do the pass-through anymore and the sound
would not be playing.  I'm not sure if it is a good idea to silently
disconnect from the default destination an audioSource when connecting
it to another processing node, maybe explicitly making the person call
a disconnect method is not that bad, and like that it is more clear
what he's doing, but I realize it is more verbose.

>>
>> There is one thing question related to this is whether the volume
>> control that the <audio> elements have by default would modify the
>> audioSource gain or the default audioDestination.  I think it would
>> make more sense to modify the audioSource gain, because like that if
>> the user modifies the volume control in the filter use case, this
>> would work as expected (modifying the volume of the audio that we are
>> listening).
>
> I was thinking that the volume control on the <audio> element would simply
> be an "alias" for the audioSource gain.  Changing either one changes the
> other.

Ok, this is what I thought, as well.

>>
>> Question 2
>> -----
>> Does the audioSource element have a sampleRate, bufferLength,
>> channelCount?  This way we could setup up our filter once before
>> connecting the audioSource to it and then let it run.
>
> Just a minor nitpick - the audioSource is not actually an element
> (HTMLElement) but is just a regular JavaScript object.

Yes, I misused the "element" word.  It was clear from the specs that
the audioSource is a JavaScript object.
In any case thanks for the nitpicking, I like it!

> Sample Rate
> Right now in my specification document, all AudioNode objects have a
> sampleRate, so in that case so would AudioElementSourceNode.  But I think we
> should change this and consider another alternative which I think is
> reasonable (and would highly recommend).  This is to consider that every
> single node in the AudioContext is running at the same sample-rate.  This is
> currently the case, covers almost all use cases I can think of, and avoids
> trying to connect together nodes that are running at different rates (where
> very bad things will happen!)  If we can make this assumption, then only the
> AudioContext needs to have a sampleRate attribute.  Even though individual
> audio elements may reference files which are at different sample rates, they
> would always be converted (behind the scene) to the AudioContext sample rate
> before we ever touch them.
> In this case, the sampleRate would never change as far as the AudioNodes are
> concerned since the stream always gets converted to the AudioContext
> sampleRate.

Good.  I think this is a good idea.  The API has to be quite clear
about when we actually select the desired sampleRate.  For me the
confusion arrives when we already have <audio> and <video> elements in
the DOM and we want to suddenly create a context in which we will use
their audioSource elements.  Maybe, as you say, when the audioSource
gets connected to a node in a given context it automatically (behind
the scenes) resampling it's output to the context's sampleRate.  But
then we must be careful if we allow multiple contexts per DOM (maybe
this is another reason for allowing only one context).

I guess then it all gets down to: Is there any use case that needs
more than one context at different sample rates?

> bufferLength
> In my specification document, there is no such thing as "bufferLength" for
> an individual AudioNode.  The "event" passed back to the process() method
> has a "numberOfSampleFrames" attribute which is the same thing as what
> you're talking about I think.  This value could be an argument
> to createJavaScriptProcessor, so we'll know it ahead of time.
>  From then on,
> it could be guaranteed to never change, so we don't need a notification.


This sounds good to me.  I think having it as an argument is good.  In
some frameworks however you ask for a bufferSize, and the system finds
the closest possible size available.  Maybe this is applicable here as
well.  So maybe it would look something like this:

function setupAnalysis() {
   preferedBufferSize = 64
   myAudioAnalysis = context.createJavaScriptProcessor(64);

   // maybe on this platform/system/etc... and under current
conditions the minimum bufferSize allowed is 128
   reconfigureAnalysis(myAudioAnalysis.bufferSize);

   myAudioAnalysis.onprocess = process;
   var audio = document.getElementById('audioElement');
   audio.audioSource.connect(myAudioAnalysis);
}

Another possibility is just to throw an exception when the bufferSize
we asked for is not possible.  But then it would be nice to be able to
ask for a list of available bufferSizes.

  context.availableBufferSizes();


> channelCount
> The number of input channels could change (for example, from mono to
> stereo).
> So only the channelCount will make a difference to the JavaScriptProcessor,
> but your question still remains.  Should we have a event notification for
> such a change or simply require the processor to deal with it on-the-fly.
>  I'm open to either possibility.
>

Well as I said before.  My personal vote goes for an event
notification.  That way the process method has no logic about setup or
reconfiguration of the nodes, and the check is not necessary at each
block of data processed.

>
>
>> Question 3
>> -----
>> How many AudioDestinationNode instances can exist per page (DOM)?  One
>> per context? How many contexts can exist?  Can we connect audio
>> streams with different properties (sampleRate, bufferLength,
>> channelCount) to the same AudioDestinationNode instance?
>>
>> For this one I don't have any opinions yet, just the question.
>
> I'm considering a single AudioDestinationNode per AudioContext.  This is the
> "destination" attribute.  It's probably unnecessary to have more than one
> AudioContext per document since everything can be routed and mixed using
> just one.  But we could consider allowing more than one.  If we only allow
> one, then I suppose we'd have to throw an exception (or something) if more
> than one were created...

Ok, the one destination per context is good.  About the amount of
contexts allowed, I think we should answer the question of whether we
can find use cases where this is needed.  I can think of a few, but
they are very specific:

- Maybe we want a web site that outputs one sound through the
headphones and a different sound through another output device.  Kind
of like a DJ web app.  In this case it may also be interesting to have
the two contexts running at different sample rates (low sample rate ==
low quality for the headphone path and high for the other).

- Or we might want to create a game where the background music plays
with high quality (high sample rate) however the triggered samples
have a low sample rate in order to allow lower latencies/more effects,
etc...

However I must say these use cases are quite advanced and maybe very
specific, therefore we could allow one single context for now and
think of this for later versions.

Thanks for the answers Chris,
ricard

> Best Regards,
> Chris
>
>
>
>



-- 
ricard
http://twitter.com/ricardmp
http://www.ricardmarxer.com
http://www.caligraft.com
Received on Wednesday, 14 July 2010 10:07:36 UTC