- From: Kevin Gadd <kevin.gadd@gmail.com>
- Date: Wed, 1 May 2013 12:43:37 -0700
- To: public-audio@w3.org
- Message-ID: <CAPJwq3UHv+PjKZwVrkLcLMYH2QSzCajaRBDtO9DHjJLujVZ0pQ@mail.gmail.com>
Hello, I've been trying to use the Web Audio API for over a year now to support end users' attempts to port games that make use of native audio APIs. The following are spec deficiencies/bugs that I think should be addressed, based on problems I've encountered and that my users have encountered. 1. channelCount &c on AudioNodes AudioNode is specced as having these properties and they are described as applying to all nodes. They do not. StackOverflow answers by cwilson (and some manual testing on my end) indicate that AudioBufferSourceNode ignores these properties, and that it should because it has no 'input' and they only affect 'inputs'. It also appears that channel splitters/mergers ignore these properties as well, and I find it hard to justify this particular behavior. 1a. If a given AudioNode does not implement these properties, attempts to set them should throw so that end users are able to easily identify which particular nodes are 'special' and lack support for channel count control. This is an important enough feature that having to try and blindly debug it by listening to your speakers is not an acceptable scenario. 1b. I also suggest that the spec be updated to explicitly state for each given node that it does not support channelCount and kin if the node does not support them. 1c. I also believe that the AudioBufferSourceNode behavior in this case is kind of irrational: even if it doesn't have an input node, it has an 'input' in semantic terms, in that it's reading samples from a buffer. But I understand if it is too complicated or weird to implement channelCount on source nodes, and it's not the end of the world to have to put in a gain node in order to convert mono up to stereo. 2. playbackRate on AudioBufferSourceNode This property's behavior is effectively unspecified. 2a. Please specify the behavior. Without knowing what it does, it's not possible to use it to achieve particular audio goals. 2b. The spec should also be updated to make it clear that you can use playbackRate to adjust the pitch of audio being played back. All mentions of 'pitch' in the spec merely refer to the panner node's doppler effect support, which makes it appear as if that is the only way to accomplish pitch shifting. (I understand that 'pitch shifting' is not what this property actually does, and that it instead adjusts the sampling rate of playback in some fashion, either through a FFT or something else.) 3. Stereo panning is incredibly complicated and error-prone At present, the only way to do stereo panning in the Web Audio API involves 3 gain nodes, a channel splitter and a channel merger. This is easy to get wrong, in particular because issue #1 makes the most obvious implementation not work correctly for mono sources but work correctly for stereo sources, so you can end up with broken code out in the wild. I also consider it a problem if playing individual samples with panning (say, in an Impulse Tracker player) requires the creation of 5 nodes for every single active sound instance. This seems like it would implicitly create a lot of mixing/filtering overhead, use a lot more memory, and increase GC pressure. 3a. If possible, a simple mechanism for stereo panning should be introduced. Ideally this could be exposed by PannerNode, or by a new 2DPannerNode type. Another option would be a variant of GainNode that allows per-channel gain (but I dislike this option since it overlaps ChannelSplitter/ChannelMerger too much). 3b. If a new node is not possible, the correct way to do this should be clearly specified, in particular because channelsplitter/channelmerger explicitly avoid specifying which channel is 'left' and which channel is 'right' in a stereo source. 3c. One other option is to clearly specify the behavior of the existing PannerNode so that it is possible to use it to achieve 2D panning. I don't know anyone who has done this successfully (a couple of my users tried and failed; they claim that the PannerNode never does channel volume attenuation.) 4. createBuffer is synchronous The spec still does not clearly communicate anywhere to end users that one of createBuffer's overloads does a synchronous audio decode. Current implementations in the wild thus cause the browser to hang for multiple seconds, unresponsive, when you call the overload that causes a synchronous decode. Worse still, the profiler in Chrome does not record samples for this operation, so it is very difficult to identify the problem. If an end user simply looks over the spec's list of methods, they will almost always choose createBuffer over decodeAudioData (it's simpler, and it has the mixToMono parameter, so it's more powerful), and end up with an app that is subtly broken. 4a. The steps in the spec should explicitly require a synchronous decode. As currently written, the described steps could easily be performed asynchronously on a mixer thread and still produce a valid result (as long as the decoding finished before the first time the sound was actually played). 4b. The spec should be painfully, obviously clear that using this overload of createBuffer will hang your browser. 4c. If possible, this overload should be disabled unless running in a web worker. But I can imagine that there may be particular use cases where a synchronous decode on the browser's UI thread is desired. 5. It is unclear which audio formats can be decoded by createBuffer/decodeAudioData At present the spec appears to have no opinion about what can be decoded by an implementation, or how you should detect the correct audio format to use. This has already led to subtle bugs in implementations that were not caught until I ran end users' games in browsers with implementations that defied expectations. 5a. Update the spec to state that Audio.canPlayType should return information that matches the behavior of the Web Audio API. 5b. Or, expose a way to query the web audio API about which mime types it can decode. 5c. Or, explicitly state that the way you are supposed to format detect is by downloading the entire mp3/ogg/etc versions of your sounds and trying to decode them one at a time. I consider this an unacceptable solution, but it would be better than the current unspecified state. 6. Pausing playback is not built into the API and workarounds have issues At present the API exposes no way to pause playback of an AudioBufferSourceNode. Workarounds have been proposed on StackOverflow in other forums but these workarounds have issues (primarily that they involve a race condition between JS and the mixer, but also, they are needlessly complex and difficult to implement). Pausing is also near nightmare status when looping is involved. The interaction between the current workaround and playbackRate is also unspecified. 6a. Add pause(optional double when) and resume(optional double when) methods to AudioBufferSourceNode. 6b. If not 6a, clearly specify the intended workaround and describe a solution to the race condition between JS and the mixer. 6c. If not 6a, clearly describe how to implement pausing correctly with looping active. This has not ever been stated and seems incredibly dependent on the exact implementation of the mixer (i.e. is looping gapless, etc) 6d. Clearly specify the interaction between the offset/duration arguments to AudioBufferSourceNode.start and AudioBufferSourceNode.playbackRate so that it is possible to correctly implement the pause workaround when playbackRate is used. To clearly state the race, the current workaround (advocated by cwilson, iirc) is this: When calling start(), record AudioContext.currentTime as the 'playback start time'. To pause, call stop(), record AudioContext.currentTime as the 'playback stop time' and throw away your current AudioBufferSourceNode. To resume, create a new AudioBufferSourceNode, and call start with an offset equal to ('playback stop time' - 'playback start time'). The problem is that AudioContext.currentTime is specified as 'always moving forward' and increasing in real-time. It cannot be paused or re-positioned. This means that the currentTime can change between the call to stop() and the retrieval of the currentTime attribute; furthermore, an unknown amount of time can elapse between the call to start() and the actual beginning of audio playback. So your recorded start time/stop time can end up off by some unknown number of milliseconds. As noted above, this workaround has other deficiencies as well. Even if this workaround did not have multiple deficiencies, I believe it is unacceptably complex for such a simple, common audio operation. Pausing and resuming playback happens all the time. It should not be this complex and it should not produce GC pressure. 7. Playback state of AudioBufferSourceNodes is needlessly difficult to access Related to 6 and 2 - I have a ton of code written to perform simple operations like figure out whether a given AudioBufferSourceNode is currently playing audio. No sane audio API I have ever used makes it this hard to do something this simple. Adding in features like playbackRate and loop makes this non-trivial to do in JS and easy to get wrong. 7a. Add an attribute to AudioBufferSourceNode, hypothetically called isPlaying, which returns true if the node is currently playing and false if it is not. 7b. Add an attribute to AudioBufferSourceNode, hypothetically called playbackOffset, which returns the current playback offset of the node if it is playing (and, given the presence of a pausing mechanism from 6a, returns the most recent playback offset if it is paused). 7c. If pausing is added as a mechanism, expose an attribute that returns the paused state (hypothetically called isPaused) 7d. If polling is not preferable, expose some sort of event handler or callback that can be used to get notifications about the state of an AudioBufferSourceNode in order to support polling use cases, like the Audio element's 'ended' event. If it is helpful, you can see my current Web Audio API backend implementation here: https://github.com/kevingadd/JSIL/blob/master/Libraries/JSIL.Browser.Audio.js#L219 Some of this feedback is based on older versions of the backend or feedback from users, though. Thanks -kg
Received on Wednesday, 1 May 2013 19:44:45 UTC