Re: Some general feedback on the Web Audio API spec and suggestions for improvements from Chris Rogers on 2013-05-03 (public-audio@w3.org from April to June 2013)

From: Chris Rogers <crogers@google.com>
Date: Fri, 3 May 2013 15:31:06 -0700
To: Kevin Gadd <kevin.gadd@gmail.com>
Cc: "public-audio@w3.org" <public-audio@w3.org>
Message-ID: <CA+EzO0nAuFR+duSS93gT5j1LSrG1YDG4qVw+re7pak2uMZWotw@mail.gmail.com>
On Wed, May 1, 2013 at 12:43 PM, Kevin Gadd <kevin.gadd@gmail.com> wrote:

> Hello,
> I've been trying to use the Web Audio API for over a year now to support
> end users' attempts to port games that make use of native audio APIs. The
> following are spec deficiencies/bugs that I think should be addressed,
> based on problems I've encountered and that my users have encountered.
>
> 1. channelCount &c on AudioNodes
> AudioNode is specced as having these properties and they are described as
> applying to all nodes. They do not.
> StackOverflow answers by cwilson (and some manual testing on my end)
> indicate that AudioBufferSourceNode ignores these properties, and that it
> should because it has no 'input' and they only affect 'inputs'. It also
> appears that channel splitters/mergers ignore these properties as well, and
> I find it hard to justify this particular behavior.
>

These attributes have always been about how channel mixing works with
*inputs*, but that may not have been so clear when reading quickly through
the basic description of these attributes in the AudioNode section.  I've
tried to clarify that:
https://dvcs.w3.org/hg/audio/rev/21562b34bf0f


>
> 1a. If a given AudioNode does not implement these properties, attempts to
> set them should throw so that end users are able to easily identify which
> particular nodes are 'special' and lack support for channel count control.
> This is an important enough feature that having to try and blindly debug it
> by listening to your speakers is not an acceptable scenario.
>
1b. I also suggest that the spec be updated to explicitly state for each
> given node that it does not support channelCount and kin if the node does
> not support them.
>

That's maybe a good idea.


>

> 1c. I also believe that the AudioBufferSourceNode behavior in this case is
> kind of irrational: even if it doesn't have an input node, it has an
> 'input' in semantic terms, in that it's reading samples from a buffer. But
> I understand if it is too complicated or weird to implement channelCount on
> source nodes, and it's not the end of the world to have to put in a gain
> node in order to convert mono up to stereo.
>
>
> 2. playbackRate on AudioBufferSourceNode
> This property's behavior is effectively unspecified.
>
> 2a. Please specify the behavior. Without knowing what it does, it's not
> possible to use it to achieve particular audio goals.
> 2b. The spec should also be updated to make it clear that you can use
> playbackRate to adjust the pitch of audio being played back. All mentions
> of 'pitch' in the spec merely refer to the panner node's doppler effect
> support, which makes it appear as if that is the only way to accomplish
> pitch shifting.  (I understand that 'pitch shifting' is not what this
> property actually does, and that it instead adjusts the sampling rate of
> playback in some fashion, either through a FFT or something else.)
>

The .playbackRate is basically just how fast the sample-data is rendered,
and is similar to speeding up or slowing down a turntable.  It's a very
common feature in basic sample-based playback, and has been around since
the earliest digital samplers.  Also, this is similar to OpenAL AL_PITCH.

It's not using an FFT, but is just resampling the audio data to a higher or
lower rate.  We could be more clear in the spec what type of interpolation
algorithm is used.  I think we've discussed different techniques like
linear interpolation, 4th order polynomial, etc.  We could consider a
quality or algorithm attribute to control this to tradeoff quality versus
performance...


>
> 3. Stereo panning is incredibly complicated and error-prone
> At present, the only way to do stereo panning in the Web Audio API
> involves 3 gain nodes, a channel splitter and a channel merger. This is
> easy to get wrong, in particular because issue #1 makes the most obvious
> implementation not work correctly for mono sources but work correctly for
> stereo sources, so you can end up with broken code out in the wild. I also
> consider it a problem if playing individual samples with panning (say, in
> an Impulse Tracker player) requires the creation of 5 nodes for every
> single active sound instance. This seems like it would implicitly create a
> lot of mixing/filtering overhead, use a lot more memory, and increase GC
> pressure.
>
> 3a. If possible, a simple mechanism for stereo panning should be
> introduced. Ideally this could be exposed by PannerNode, or by a new
> 2DPannerNode type. Another option would be a variant of GainNode that
> allows per-channel gain (but I dislike this option since it overlaps
> ChannelSplitter/ChannelMerger too much).
> 3b. If a new node is not possible, the correct way to do this should be
> clearly specified, in particular because channelsplitter/channelmerger
> explicitly avoid specifying which channel is 'left' and which channel is
> 'right' in a stereo source.
> 3c. One other option is to clearly specify the behavior of the existing
> PannerNode so that it is possible to use it to achieve 2D panning. I don't
> know anyone who has done this successfully (a couple of my users tried and
> failed; they claim that the PannerNode never does channel volume
> attenuation.)
>

The PannerNode has a .modelType of "equalpower" for musical types of
panning and it's in the spec.
Received on Friday, 3 May 2013 22:31:34 UTC