Re: Some general feedback on the Web Audio API spec and suggestions for improvements from Kevin Gadd on 2013-05-03 (public-audio@w3.org from April to June 2013)

From: Kevin Gadd <kevin.gadd@gmail.com>
Date: Fri, 3 May 2013 16:10:38 -0700
To: Chris Rogers <crogers@google.com>
Cc: "public-audio@w3.org" <public-audio@w3.org>
Message-ID: <CAPJwq3XosSHCzFWcUPmDD9MrGKuPgS8rTcbHW--aweoADjx-6A@mail.gmail.com>
Your revision re: 1 is perfect. I think anyone who reads those revisions
will know what's going on very quickly.

Re 2 (playbackRate) what I specifically mean is that you never state what
that value *means*, and I've seen various alternative ways of expressing it
in different APIs. I only was able to assume that it's a constant
floating-point multiplier (and not say, a percentage, or raw sampling rate)
because of the default value of '1'. XAudio, for example, takes a floating
point sample rate multiplier (they state this explicitly and also state the
min/max values) while XNA takes a floating point *delta* expressed in
*octaves*. Maybe every other API in the universe uses the same semantics as
Web Audio, though, and I'm just unlucky. If this is inspired by OpenAL that
makes more sense; maybe you should call that out directly so people know
what to use as a reference.

"equalpower" sounds great; I will look into it. When was that added to the
spec? If you search for 'web audio panning' right now, you mostly find
stuff like this:
http://stackoverflow.com/questions/11235068/web-audio-api-how-to-use-audiopannernode-for-regular-lr-panning
http://stackoverflow.com/questions/14915951/web-audio-no-sound-in-right-channel
Which naturally leads one down the splitter/merger path, especially given
that it is still nonobvious how to achieve arbitrary L/R volumes (say 1.0
volume left, 0.1 volume right) given the 3d position inputs for listener
and panner given a pannernode.
I should also note that "equalpower" does not appear to be specified
sufficiently enough for me to even infer that it works via changing the L/R
volumes. What does it do in 5.1 or mono scenarios?

-kg


On Fri, May 3, 2013 at 3:31 PM, Chris Rogers <crogers@google.com> wrote:

>
>
>
> On Wed, May 1, 2013 at 12:43 PM, Kevin Gadd <kevin.gadd@gmail.com> wrote:
>
>> Hello,
>> I've been trying to use the Web Audio API for over a year now to support
>> end users' attempts to port games that make use of native audio APIs. The
>> following are spec deficiencies/bugs that I think should be addressed,
>> based on problems I've encountered and that my users have encountered.
>>
>> 1. channelCount &c on AudioNodes
>> AudioNode is specced as having these properties and they are described as
>> applying to all nodes. They do not.
>> StackOverflow answers by cwilson (and some manual testing on my end)
>> indicate that AudioBufferSourceNode ignores these properties, and that it
>> should because it has no 'input' and they only affect 'inputs'. It also
>> appears that channel splitters/mergers ignore these properties as well, and
>> I find it hard to justify this particular behavior.
>>
>
> These attributes have always been about how channel mixing works with
> *inputs*, but that may not have been so clear when reading quickly through
> the basic description of these attributes in the AudioNode section.  I've
> tried to clarify that:
> https://dvcs.w3.org/hg/audio/rev/21562b34bf0f
>
>
>>
>> 1a. If a given AudioNode does not implement these properties, attempts to
>> set them should throw so that end users are able to easily identify which
>> particular nodes are 'special' and lack support for channel count control.
>> This is an important enough feature that having to try and blindly debug it
>> by listening to your speakers is not an acceptable scenario.
>>
> 1b. I also suggest that the spec be updated to explicitly state for each
>> given node that it does not support channelCount and kin if the node does
>> not support them.
>>
>
> That's maybe a good idea.
>
>
>>
>
>> 1c. I also believe that the AudioBufferSourceNode behavior in this case
>> is kind of irrational: even if it doesn't have an input node, it has an
>> 'input' in semantic terms, in that it's reading samples from a buffer. But
>> I understand if it is too complicated or weird to implement channelCount on
>> source nodes, and it's not the end of the world to have to put in a gain
>> node in order to convert mono up to stereo.
>>
>>
>> 2. playbackRate on AudioBufferSourceNode
>> This property's behavior is effectively unspecified.
>>
>> 2a. Please specify the behavior. Without knowing what it does, it's not
>> possible to use it to achieve particular audio goals.
>> 2b. The spec should also be updated to make it clear that you can use
>> playbackRate to adjust the pitch of audio being played back. All mentions
>> of 'pitch' in the spec merely refer to the panner node's doppler effect
>> support, which makes it appear as if that is the only way to accomplish
>> pitch shifting.  (I understand that 'pitch shifting' is not what this
>> property actually does, and that it instead adjusts the sampling rate of
>> playback in some fashion, either through a FFT or something else.)
>>
>
> The .playbackRate is basically just how fast the sample-data is rendered,
> and is similar to speeding up or slowing down a turntable.  It's a very
> common feature in basic sample-based playback, and has been around since
> the earliest digital samplers.  Also, this is similar to OpenAL AL_PITCH.
>
> It's not using an FFT, but is just resampling the audio data to a higher
> or lower rate.  We could be more clear in the spec what type of
> interpolation algorithm is used.  I think we've discussed different
> techniques like linear interpolation, 4th order polynomial, etc.  We could
> consider a quality or algorithm attribute to control this to tradeoff
> quality versus performance...
>
>
>>
>> 3. Stereo panning is incredibly complicated and error-prone
>> At present, the only way to do stereo panning in the Web Audio API
>> involves 3 gain nodes, a channel splitter and a channel merger. This is
>> easy to get wrong, in particular because issue #1 makes the most obvious
>> implementation not work correctly for mono sources but work correctly for
>> stereo sources, so you can end up with broken code out in the wild. I also
>> consider it a problem if playing individual samples with panning (say, in
>> an Impulse Tracker player) requires the creation of 5 nodes for every
>> single active sound instance. This seems like it would implicitly create a
>> lot of mixing/filtering overhead, use a lot more memory, and increase GC
>> pressure.
>>
>> 3a. If possible, a simple mechanism for stereo panning should be
>> introduced. Ideally this could be exposed by PannerNode, or by a new
>> 2DPannerNode type. Another option would be a variant of GainNode that
>> allows per-channel gain (but I dislike this option since it overlaps
>> ChannelSplitter/ChannelMerger too much).
>> 3b. If a new node is not possible, the correct way to do this should be
>> clearly specified, in particular because channelsplitter/channelmerger
>> explicitly avoid specifying which channel is 'left' and which channel is
>> 'right' in a stereo source.
>> 3c. One other option is to clearly specify the behavior of the existing
>> PannerNode so that it is possible to use it to achieve 2D panning. I don't
>> know anyone who has done this successfully (a couple of my users tried and
>> failed; they claim that the PannerNode never does channel volume
>> attenuation.)
>>
>
> The PannerNode has a .modelType of "equalpower" for musical types of
> panning and it's in the spec.
>
>
>
Received on Friday, 3 May 2013 23:11:45 UTC