[web-audio-api] Layering considerations (#257)

The following point was raised by the W3C TAG as part of their [review of the Web Audio API](https://github.com/w3ctag/spec-reviews/blob/master/2013/07/WebAudio.md). In it a number of issues are raised, which we can split into separate issues if required. For now let's capture our response in this issue. 

## Layering Considerations

Web Audio is very low-level and this is a virtue. By describing a graph that operates in terms of samples of bytes, it enables developers to tightly control the behavior of processing and ensure low-latency delivery of results.

Today's Web Audio spec is an island: connected to its surroundings via loose ties, not integrated into the fabric of the platform as the natural basis and explanation of all audio processing -- despite being incredibly fit for that purpose.

Perhaps the most striking example of this comes from the presence in the platform of both Web Audio and the `<audio>` element. Given that the `<audio>` element is incredibly high-level, providing automation for loading, decoding, playback and UI to control these processes, it would appear that Web Audio lives at an all-together lower place in the conceptual stack. A natural consequence of this might be to re-interpret the `<audio>` element's playback functions _in terms of_ Web Audio. Similar descriptions can happen of the UI _in terms of_ Shadow DOM and the loading of audio data via XHR or the upcoming `fetch()` API. It's not necessary to re-interpret everything all at once, however.

Web Audio acknowledges that the `<audio>` element performs valuable audio loading work today by allowing the creation of `SourceNode` instances from them:

  * 4.11 The MediaElementAudioSourceNode Interface
var mediaElement = document.getElementById('mediaElementID');
var sourceNode = context.createMediaElementSource(mediaElement);

Lots of questions arise, particularly if we think of media element audio playback _as though_ it's low-level aspects were described in terms of Web Audio:

 * Can a media element be connected to multiple `AudioContext`s at the same time?
 * Does `ctx.createMediaElementSource(n)` disconnect the output from the default context?
 * If a second context calls `ctx2.createMediaElementSource(n)` on the same media element, is it disconnected from the first?
 * Assuming it's possible to connect a media element to two contexts, effectively "wiring up" the output from one bit of processing to the other, is it possible to wire up the output of one context to another?
 * Why are there both `MedaiaStreamAudioSourceNode` and `MediaElementAudioSourceNode` in the spec? What makes them different, particularly given that neither appear to have properties or methods and do nothing but inherit from `AudioNode`?

All of this seems to indicate some confusion in, at a minimum, the types used in the design. For instance, we could answer a few of the questions if we:

 * Eliminate `MediaElementAudioSourceNode` and instead re-cast media elements as possessing `MediaStream audioStream` attributes which can be connected to `AudioContext`s
 * Remove `createMediaElementSource()` in favor of `createMediaStreamSource()`
 * Add constructors for all of these generated types; this would force explanation of how things are connected.

That leaves a few open issues for which we don't currently have suggestions but believe the WG should address:

 * What `AudioContext` do media elements use by default?
 * Is that context available to script? Is there such a thing as a "default context"?
 * What does it mean to have multiple `AudioContext` instances for the same hardware device? Chris Wilson advises that they are simply sum'd, but how is _that_ described?
 * By what mechanism is an `AudioContext` attached to hardware? If I have multiple contexts corresponding to independent bits of hardware...how does that even happen? `AudioContext` doesn't seem to support any parameters and there aren't any statics defined for "default" audio contexts corresponding to attached hardware (or methods for getting them).

Reply to this email directly or view it on GitHub:

Received on Thursday, 17 October 2013 12:33:58 UTC