Re: Requirements for Web audio APIs from Chris Rogers on 2011-05-23 (public-audio@w3.org from April to June 2011)

From: Chris Rogers <crogers@google.com>
Date: Sun, 22 May 2011 18:55:29 -0700
To: robert@ocallahan.org
Cc: public-audio@w3.org
Message-ID: <BANLkTinTxhUkvG+xKKhPqaeexNRg=S1RTA@mail.gmail.com>
On Sun, May 22, 2011 at 6:11 PM, Robert O'Callahan <robert@ocallahan.org>wrote:

> On Mon, May 23, 2011 at 12:11 PM, Chris Rogers <crogers@google.com> wrote:
>
>> On Sun, May 22, 2011 at 2:39 PM, Robert O'Callahan <robert@ocallahan.org>wrote:
>>
>>> On Sat, May 21, 2011 at 7:17 AM, Chris Rogers <crogers@google.com>wrote:
>>>
>>>>  On Thu, May 19, 2011 at 2:58 AM, Robert O'Callahan <
>>>> robert@ocallahan.org> wrote:
>>>>
>>>>> My concern is that having multiple abstractions representing streams of
>>>>> media data --- AudioNodes and Streams --- would be redundant.
>>>>>
>>>>
>>>> Agreed, there's a need to look at this carefully.  It might be workable
>>>> if there were appropriate ways to easily use them together even if they
>>>> remain separate types of objects.  In graphics, for example, there are
>>>> different objects such as Image, ImageData, and WebGL textures which have
>>>> different relationships with each other.  I don't know what the right answer
>>>> is, but there are probably various reasonable ways to approach the problem.
>>>>
>>>
>>> There are reasons why we need to have different kinds of image objects.
>>> For example, a WebGL texture has to live in VRAM so couldn't have its pixel
>>> data manipulated by JS the way an ImageData object can. Are there
>>> fundamental reasons why AudioNodes and Streams have to be different ... why
>>> we couldn't express the functionality of AudioNodes using Streams?
>>>
>>
>> I didn't say they *have* to be different.  I'm just saying that there
>> might be reasonable ways to have AudioNodes and Streams work together. I
>> could also turn the question around and ask if we could express the
>> functionality of Streams using AudioNodes?
>>
>
> Indeed! One answer to that would be that Streams contain video so
> "AudioNode" isn't a great name for them :-).
>
> If they don't have to be different, then they should be unified into a
> single abstraction. Otherwise APIs that work on media streams would have to
> come in an AudioNode version and a Stream version, or authors would have to
> create explicit bridges.
>

For connecting an audio source from an HTMLMediaElement into an audio
processing graph using the Web Audio API, I've suggested adding an
.audioSource attribute.  A code example with diagram is here in my proposal:
http://chromium.googlecode.com/svn/trunk/samples/audio/specification/specification.html#DynamicLifetime-section

I'm fairly confident that this type of approach will work well for
HTMLMediaElement.  Basically, it's a "has-a" design instead of "is-a"

Similarly, for Streams I think the same type of approach could be
considered.  I haven't looked very closely at the proposed media stream API
yet, but would like to explore that in more detail.  If we adopt the "has-a"
(instead of "is-a") design then the problem of AudioNode not being a good
name for Stream disappears.



>
>
>>
>>>
>>>  That sounds good, but I was thinking of other sorts of problems.
>>>>> Consider for example the use-case of a <video> movie with a regular audio
>>>>> track, and an auxiliary <audio> element referencing a commentary track,
>>>>> where we apply an audio ducking effect to overlay the commentary over the
>>>>> regular audio. How would you combine audio from both streams and keep
>>>>> everything in sync (including the video), especially in the face of issues
>>>>> such as one of the streams temporarily pausing to buffer due to a network
>>>>> glitch?
>>>>>
>>>>
>>>> In general this sounds like a very difficult problem to solve.  Because
>>>> if you had two <video> streams playing together, then either one of them
>>>> could pause momentarily due to buffer underrun, so each one would have to
>>>> adjust to the other.  Then if you had more than two, any of them could
>>>> require adjustments in all of the others.
>>>>
>>>
>>> That's right. I agree it's hard, but I think we need to solve it, or at
>>> least have a plausible plan to solve it. This is not a far-fetched use-case.
>>>
>>
>> Sure, I don't disagree that it might be useful.  I'm just suggesting that
>> solving this problem is something that can be done at the HTMLMediaElement
>> streaming level.  Once these media elements are synced up (assuming somebody
>> specs that out and implements that for HTMLMediaElement) then these elements
>> are ready to be inserted into a processing graph using the Web Audio API.
>>
>
> That might work, I'm not sure yet. But it would require the author to
> figure out the synchronization requirements of the audio graph and restate
> those requirements to the media elements.
>

If there's an explicit API created for HTMLMediaElement allowing for
synchronization between multiple elements as you describe, then the author
would have to use this API on <audio> and <video> elements to setup the
synchronization.  But, I think that could be separated from  the Web Audio
API / audio-graph aspects.  The audio graph latency compensation (if any)
could be handled "under-the-hood" without the author's intervention.

Chris
Received on Monday, 23 May 2011 01:55:57 UTC