Re: Draft WebAudio API review text from Alex Russell on 2013-07-25 (www-tag@w3.org from July 2013)

From: Alex Russell <slightlyoff@google.com>
Date: Wed, 24 Jul 2013 17:16:39 -0700
To: Sergey Konstantinov <twirl@yandex-team.ru>
Cc: "www-tag@w3.org List" <www-tag@w3.org>
Message-ID: <CANr5HFXU+rAfu-DRHon6TVxV+pCH_kLfa-gQH2F9BYC7-N3ThQ@mail.gmail.com>
Hi Sergey,

On Mon, Jul 22, 2013 at 2:30 AM, Sergey Konstantinov
<twirl@yandex-team.ru>wrote:

>  I read both Web Audio API Specification and Alex's review, and I have
> some comments on both.
>
> 1. On review
>
> First of all, I strongly support Alex in two cases:
>
> (a) Web Audio API and <audio> tag relations. In its current state both Web
> Audio and <audio> would be just bindings to some internal low-level audio
> API. In my opinion that's bad idea; Web Audio API design should allow to
> build <audio> on top of it as a pure high-level JavaScript component.
>
> (b) Audio nodes should be constructible. Let me express there even more
> extreme point: create* methods of AudioContext interface is just redundant
> and should be removed since there is no such use-case as to inherit the
> AudioContext class and to redefine factory methods. create* methods make a
> mess of AudioContext interface as only one (of 18) class methods is
> meaningful while the others are just helpers. That makes very hard to
> understand the real responsibility of AudioContext object.
>
> As far as I understand the very meaning of the AudioContext is the time
> notion for its nodes (which could be real time for base AudioContext and
> virtual time for OfflineAudioContext).
>

I chatted with Chris Wilson about this last week and he suggests that
AudioContext is designed to model some bit of underlying hardware. All of
the operations it defines do the impedence matching for things from one
format to something that the proximate hardware can consume directly. But
the API here is deeply confused. How do you enumerate the hardware? What
hardware is default? What if there is no audio hardware?


> But I failed to extract from the specification how exactly audio nodes
> interact with their context. There is no 'tick' event and no global timer,
> and it seems that all synchronisation and latency problems are somehow
> hidden in the AudioContext and audio nodes implementation. I'm strongly
> convicted that this is the wrong approach. In my opinion the low-level API
> should clearly reflect the internal principles of its organization.
>

I'm not so sold on that. Making it possible for you to send audio to the OS
is really the only job that such an API COULD do from the perspective of
browser implementers. Doing more than that won't be portable and therefore
isn't something we could do in a web API.


> 2. On specification
>
> Regretfully I'm no specialist in 3d gaming or garage-band-like
> applications, but I see some obvious problems in using Web Audio API
>
> (a) There is no simple way to convert AudioContext to OfflineAudioContext,
>

Interesting. I think the intuition is that you can use an
OfflineAudioContext for bulk processing and use an AudioContext for
real-time processing and playback.


> so there is no simple way to upload the prepared composition. If you need
> to edit the composition in browser and then to save (upload) it, you have
> to create OfflineAudioContext, clone every single audio node (which is,
> again, complicated as they haven't "clone" methods)
>

Right, good catch on the lack of serialization. Adding an issue for that in
the document.


> from the real-time context and then call startRendering(). My suggestion
> is (a) to transfer startRendering method to base AudioContext, (b) remove
> OfflineAudioContext (which is bad name in any case since it is really
> NotRealTimeAudioContext, not Offline) entirely.
>

Yeah, calling it "BulkProcessingContext" might be better. The name really
is bad.


>  (b) There is very odd statement in the AudioBuffer interface:
>
> " This interface represents a memory-resident audio asset (for one-shot
> sounds and other short audio clips). Its format is non-interleaved IEEE
> 32-bit linear PCM with a nominal range of -1 -> +1. It can contain one or
> more channels. Typically, it would be expected that the length of the PCM
> data would be fairly short (usually somewhat less than a minute). For
> longer sounds, such as music soundtracks, streaming should be used with the
> audio element and MediaElementAudioSourceNode."
>
> In first, what does it mean - "it would be expected that the length of the
> PCM data would be fairly short"? What will happen if the data is larger?
> There is no OutOfMemory exception and no explanation for that limit. How do
> we expect the developers to deal with that unpredictable constraint? Since
> we are dealing with binary data without any overhead - why to have any
> limit at all?
>

I think this is about buffer sizes for real-time processing. The incentive
to keep samples short is down to latency for processing them.


> In second, the 1 minute limit is clearly insufficient for the declared
> purpose of making 3d games and audio editors in browser.
>

Hrm. I think we should take this to their mailing list. I don't understand
the way people want to use this to konw if it's real hazard


> In third, fallback to <audio> element isn't really an option as we are
> agreed that audio element should be implemented in terms of Web Audio API,
> not vice versa.
>
> So, in my opinion, the Web Audio API in its current state doesn't provide
> appropriate interface for games and audio editors.
>
Received on Thursday, 25 July 2013 00:17:36 UTC