Re: "Layering Considerations" issue from Alex Russell on 2013-07-26 (public-audio@w3.org from July to September 2013)

From: Alex Russell <slightlyoff@google.com>
Date: Thu, 25 Jul 2013 18:35:16 -0700
To: "robert@ocallahan.org" <robert@ocallahan.org>
Cc: "public-audio@w3.org" <public-audio@w3.org>, "www-tag@w3.org List" <www-tag@w3.org>, Olivier Thereaux <Olivier.Thereaux@bbc.co.uk>
Message-ID: <CANr5HFVyX2YPLPsL0FO5c3d9impqw9CcUvWovJF+W7Uq-=7+2g@mail.gmail.com>
Thanks for taking the time to respond. Inline.

On Friday, July 26, 2013, Robert O'Callahan wrote:

> Let me propose some answers to your questions in this section --- partly
> because I just implemented these features in Gecko!
>
> Can a media element be connected to multiple AudioContexts at the same
>> time?
>>
>
> Yes.
>

The reason for asking this question (and many others that you respond to
below) wasn't to simply understand the spec -- I know now (having tried it
out) that such connections are possible. What the question was meant to
uncover was *by which mechanism *that has become true. It's not exposed, so
the question presents itself without an obvious answer. *That* is the
issue. Not connection (or not) to multiple contexts.


>
>
>> Does ctx.createMediaElementSource(n) disconnect the output from the
>> default context?
>>
>
> In Gecko, it actually does, because that seems to be what people will
> usually want. However nothing in the spec says it should, so I think that
> should be a spec change. It's easy to fix our behavior if we decide not to
> take that spec change.
>

Yes, and that's the case in Blink as well.

What this discussion was about isn't "does this work?" but "how does that
happen?".

I'm asking for this WG to explain much more clearly to the rest of the
world *how* the interactions it creates are plumbed through the platform.

In many cases those explanations might be "v2" things. That's an answer of
a sort, but the kind that should only be accepted when it becomes clear
that the discusison about how/what/when has taken place. In most parts of
the design today, it appears the discussion has never happened.


> If a second context calls ctx2.createMediaElementSource(n) on the same
>> media element, is it disconnected from the first?
>>
>
> No.
>

Again, that wasn't posed as a functional issue. It was a rhetorical
question about the lack of visible mechanism/explanation.


> Assuming it's possible to connect a media element to two contexts,
>> effectively "wiring up" the output from one bit of processing to the other,
>> is it possible to wire up the output of one context to another?
>>
>
> Yes, by connecting a MediaStreamAudioDestinationNode from one context to a
> MediaStreamAudioSourceNode in another context.
>

Again, I accept that answer (and, of course, discovered it myself before
writing it down here).

It is meant to get _you_ asking "how does that work?". That it does is no
mean feat.


> Why are there both MedaiaStreamAudioSourceNode and
>> MediaElementAudioSourceNode in the spec? What makes them different,
>> particularly given that neither appear to have properties or methods and do
>> nothing but inherit from AudioNode?
>>
>
>> All of this seems to indicate some confusion in, at a minimum, the types
>> used in the design. For instance, we could answer a few of the questions if
>> we:
>>
>> Eliminate MediaElementAudioSourceNode and instead re-cast media elements
>> as possessing MediaStream audioStream attributes which can be connected to
>> AudioContexts
>>
>
> This is close to what we implement in Gecko.
>

Good to hear!


> We have extended media elements with mozCaptureStream and
> mozCaptureStreamUntilEnded methods (sorry about the prefixes, they predate
> the new policy) which return new MediaStreams. (I think returning new
> streams is a more robust design than having one intrinsic stream that all
> users of an element must share; also, as discussed above obtaining a
> MediaStream disables regular audio output, which as a side-effecting
> operation is not suitable for an attribute getter.)
> MediaElementAudioSourceNode is just a thin wrapper around
> MediaStreamAudioSourceNode taking the result of mozCaptureStream.
>

>From a spec perspective, it's hard to see why both persist then.


> That leaves a few open issues for which we don't currently have
>> suggestions but believe the WG should address:
>>
>> What AudioContext do media elements use by default?
>>
>
> I don't think defining a "default AudioContext" buys us much, since it has
> no real effect.
>

It'd have raised the issue of "what does it mean to attach to hardware?"
and "what *is* the connection between AudioContext and underlying
hardware?" had it been considered at more length.


> Is that context available to script? Is there such a thing as a "default
>> context"?
>>
>
> No, and no.
>

I agree with the first, but not the second. It's observable in
implementations that there is a default context.


>
>
>> What does it mean to have multiple AudioContext instances for the same
>> hardware device?
>
> Chris Wilson advises that they are simply sum'd, but how is that described?
>>
> By what mechanism is an AudioContext attached to hardware? If I have
>> multiple contexts corresponding to independent bits of hardware...how does
>> that even happen? AudioContext doesn't seem to support any parameters and
>> there aren't any statics defined for "default" audio contexts corresponding
>> to attached hardware (or methods for getting them).
>>
>
> I think the manner in which media element output and AudioContexts' output
> are mixed and played is outside the scope of Web Audio.
>

That's a fine answer, but it's not OK to then punt forever. At a minimum,
the "default context" and the mixing should be defined IN TERMS of Web
Audio, if only at a non-observable, spec-text level.


> Some of that should probably be implementation dependent (e.g. a UA could
> allow the user to direct output of particular pages, elements, or
> AudioContexts to particular output devices, or let users control the volume
> of those outputs independently). Some of that will be affected by proposed
> specs for "output channels" (e.g.
> http://lists.whatwg.org/htdig.cgi/whatwg-whatwg.org/2013-March/039202.html
> ).
>
> Rob
> --
> Jtehsauts  tshaei dS,o n" Wohfy  Mdaon  yhoaus  eanuttehrotraiitny  eovni
> le atrhtohu gthot sf oirng iyvoeu rs ihnesa.r"t sS?o  Whhei csha iids  teoa
> stiheer :p atroa lsyazye,d  'mYaonu,r  "sGients  uapr,e  tfaokreg iyvoeunr,
> 'm aotr  atnod  sgaoy ,h o'mGee.t"  uTph eann dt hwea lmka'n?  gBoutt  uIp
> waanndt  wyeonut  thoo mken.o w  *
> *
>
Received on Friday, 26 July 2013 01:35:44 UTC