Re: About AudioPannerNode from Marcus Geelnard on 2012-06-21 (public-audio@w3.org from April to June 2012)

From: Marcus Geelnard <mage@opera.com>
Date: Thu, 21 Jun 2012 09:06:27 +0200
To: "Chris Rogers" <crogers@google.com>
Cc: public-audio@w3.org
Message-ID: <op.wf8rc1grm77heq@mage-desktop>
Den 2012-06-19 19:39:53 skrev Chris Rogers <crogers@google.com>:

> On Tue, Jun 19, 2012 at 2:50 AM, Marcus Geelnard <mage@opera.com> wrote:
>
>> Here's another subject... :)
>>
>> I've looked a bit at the AudioPannerNode/**AudioListenerNode pair, and I
>> think there are a few things that need to be discussed.
>>
>>
>> First of all, the AudioPannerNode is a bit of a odd-one-out among the
>> audio nodes, since it has an implicit dependency on the  
>> AudioListenerNode
>> of the context to which the panner node belongs. If we wanted to  
>> decouple
>> audio nodes from the audio context (e.g. to make it possible to connect
>> nodes from different audio contexts, as have been suggested), the
>> AudioPannerNode becomes a special case. Not sure how an alternate  
>> solution
>> should be designed (discussions welcome), but one idea could be to  
>> decouple
>> the AudioListenerNode from the AudioContext, and manually construct
>> listener nodes and set the listener node for each panner node. This  
>> would
>> also make it possible to have any number of listeners (e.g. for doing
>> background sounds or other listener-aligned sound sources).
>>
>
> I think it's worth considering allowing AudioListeners to be constructed
> and assigned to a .listener attribute of AudioPannerNode.  This attribute
> would simply default to the AudioContext "default" listener.

Yes, that should probably solve several problems: The AudioPannerNode will  
not have a dependency on the AudioContext, but rather on an AudioListener  
(which could potentially be shared between context, if desired). It would  
also make it possible to have multiple listeners, and I believe that it  
would be simpler to implement the AudioPannerNode with a  
JavaScriptAudioNode this way.

An even cleaner solution, I think, would be to allow the .listener  
property of the AudioPannerNode to be NULL/undefined (and remove the  
default listener from the AudioContext). Thus giving you a pattern such as:

   var listener = context.createListener();
   var panner1 = context.createPannerNode();
   panner1.listener = listener;
   var panner2 = context.createPannerNode();
   panner2.listener = listener;

We just have to define what .listener === NULL means (e.g. pass-through or  
silence). That would be less "magic". I tend to dislike solutions where I  
have magic global variables that does important stuff - you easily end up  
with hard-to-understand bugs, e.g. if you did:

   var panner1 = context.createPannerNode();  // .listener intentionally  
not assigned to follow default listener
   var listener = context.createListener();
   var panner2 = context.createPannerNode();
   panner2.lisetner = listener;               // misspelled property ->  
.listener is never assigned

...here, both panners would unintentionally be connected to the default  
listener (which could go by un-noticed).


>> Another thing is that I don't really see how a non-mono input signal  
>> would
>> make sense for a panner node, at least not if we think of it as a 3D
>> spatialization tool. For instance, in an HRTF model, I think an audio
>> source should be in mono to make sense. Would it be a limitation if all
>> inputs are down-mixed to mono?
>>
>
> Sorry, I need to add that into the spec.  I've just added details about  
> how
> to do mono->stereo and stereo->stereo equal-power-panning.  But, it's  
> also
> possible to pan stereo sources with HRTF too.  Basically, the idea is to
> process the left input channel with the left impulse-response and the  
> right
> input channel with the right impulse-response.

Not sure what that would sound like (e.g. if you're spinning a stereo  
source in circles around your head), but it would be equivalent to a mono  
source if both channels are equal (e.g. up-mixed from mono to stereo), and  
also if your source & listener are aligned you'd get pass-through  
(almost), so in that respect it makes sense.

As long as it is well defined, that's fine.


>> On the other hand, in music applications you may want to do left-right
>> panning of stereo signals. Should that be treated as a special case
>> (configuration) of the AudioPannerNode, or would it be better to split  
>> the
>> current interface into two (e.g. StereoPannerNode and  
>> SpatializationNode)?
>>
>
> Right now the AudioPannerNode, in both the equal-power and HRTF modes,  
> will
> automatically do the right thing for both mono and stereo sources without
> any special configuration.  I think it's better to keep that part simple
> for developers and just make sure the "right thing" happens.

Again, as long as it is well defined it's fine. I think this is a case  
where the spec could have an informative example of how to achieve simple  
stereo-panning w/o any 3D effects applied.


>> Lastly, how should complex spatialization models (thinking about HRTF
>> here) be handled (should they even be supported)? I fear that a fair  
>> amount
>> of spec:ing and testing must be done to support this, not to mention  
>> that
>> HRTF in general relies on data files from real-world measurements  
>> (should
>> these be shared among implementations or not?
>
>
> I'm happy to share the measured HRTF files that we use in WebKit.  I'm  
> not
> sure if they should be normative or not...

I think we need to decide how strict the spec should be here. I generally  
prefer a model where the spec describes an optimal algorithm, and if  
required it can allow for deviations from that optimal solution to a  
certain, well-defined degree (in whatever terms are suitable for the  
algorithm at hand).

I also like to think in terms of testing. For instance, it would be  
impossible to write a useful test that verifies a loose statement such as  
"must create the impression of positioning the input signal at the given  
3D position, relative to the listener". On the other hand, it would be  
much easier to test against a normative HRTF data set, since the resulting  
signal should be accurate to (almost) floating point precision.

Here's where I can't really decide which way is better. I'm not sure if  
there are any strong use-cases for allowing implementations to use  
different data sets.

Related questions:

How big is the data set?

To what degree can you modify it (e.g. crop impulse responses or reduce  
the angular resolution) without negatively affecting the 3D effect too  
much? (has this been experimented with?)

Have you considered/compared against other available HRTF measurements,  
and if so, why were they not chosen?

/Marcus



-- 
Marcus Geelnard
Core Graphics Developer
Opera Software ASA
Received on Thursday, 21 June 2012 07:06:58 UTC